Hybrid Ensemble Learning with SMOTEENN and Soft Voting for Stunting Risk Prediction: A SHAP-Based Explainable Approach
Abstract
Stunting remains a critical public health concern in Indonesia, with long-term consequences for physical growth, cognitive development, and human capital. This study introduces a hybrid machine learning framework to predict household-level stunting risk by integrating Synthetic Minority Over-sampling Technique with Edited Nearest Neighbors (SMOTEENN), soft voting ensemble, and SHapley Additive exPlanations (SHAP). The objective is to enhance both predictive accuracy and interpretability in identifying high-risk households. A dataset of 115,579 household records from West Sumatra, comprising 20 demographic, socioeconomic, health, and housing predictors, was utilized. Preprocessing steps included handling missing values, categorical encoding, and applying SMOTEENN exclusively on the training set to mitigate class imbalance. The baseline models demonstrated limited sensitivity, with XGBoost performing best at 74.56% accuracy and 71.08% F1-score on imbalanced data. After applying SMOTEENN, performance improved substantially, with XGBoost achieving 91.82% accuracy and 91.74% F1-score. Further improvements were obtained through hybridization, where the Random Forest and XGBoost soft voting ensemble reached 91.95% accuracy and 92.46% F1-score, representing a notable gain over individual classifiers. SHAP analysis added interpretability by identifying family members, education level, diverse food consumption, occupation, and drinking water source as dominant predictors of stunting risk. The novelty of this study lies in the integration of SMOTEENN with ensemble learning and SHAP, providing not only robust performance but also transparency in feature contributions. The findings demonstrate that the proposed framework improves sensitivity to minority classes, delivers superior predictive accuracy compared to baseline models, and offers interpretable insights to guide targeted interventions. By combining methodological rigor with explainability, this research contributes a practical decision-support tool for policymakers, supporting early detection of at-risk households and accelerating stunting reduction efforts in Indonesia.
Article Metrics
Abstract: 113 Viewers PDF: 64 ViewersKeywords
Full Text:
PDFRefbacks
- There are currently no refbacks.
Journal of Applied Data Sciences
| ISSN | : | 2723-6471 (Online) |
| Collaborated with | : | Computer Science and Systems Information Technology, King Abdulaziz University, Kingdom of Saudi Arabia. |
| Publisher | : | Bright Publisher |
| Website | : | http://bright-journal.org/JADS |
| : | taqwa@amikompurwokerto.ac.id (principal contact) | |
| support@bright-journal.org (technical issues) |
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0




.png)