Optimizing Stunting Detection through SMOTE and Machine Learning: a Comparative Study of XGBoost, Random Forest, SVM, and k-NN

Tri Sugihartono, Benny Wijaya, Marini Marini, Ahmad Paqih Alkayess, Hendra Agustian Anugerah

Abstract


Stunting is a vital public health priority that affects millions of children from all over the world, especially in developing countries, where chronic malnutrition impairs their physical growth and cognitive development. Early detection of stunting is necessary for its timely intervention to reduce long-lasting effects. The following study deals with the application of higher-end machine learning techniques in order to detect stunting with more accuracy, using XGBoost, Random Forest, SVM, and k-NN algorithms. Using a dataset sourced from Kaggle, containing 10,000 samples of anthropometric and demographic features, we addressed the significant class imbalance of the data; the number of samples representing stunted children was only 15% of the total. We surmounted this limitation using SMOTE to generate synthetic data in order to balance the representation for this minority class. Further feature selection to improve the performance and interpretability of the model was done using backward elimination, where less impactful features like "Body Length" and "Breastfeeding" were systematically excluded, while putting more emphasis on more predictive variables such as weight, age, and socio-economic indicators. The evaluation of machine learning models showed significant improvements in performance with the integration of SMOTE and optimized feature selection, especially regarding recall and ROC-AUC metrics, which are critical in healthcare settings where the minimization of false negatives is of high importance. XGBoost was the best-performing model among those evaluated, yielding an accuracy of 0.8574, a recall of 0.8914, and an ROC-AUC of 0.9311, hence balancing precision and sensitivity more appropriately than other models. These results emphasize the efficiency of XGBoost in stunting detection while overcoming challenges arising from imbalanced datasets. It then illustrates the potential of merging machine learning techniques with synthetic data augmentation methodologies for the optimization of outcomes related to population health, and forms a basis for healthcare practitioners and policymakers by locating the at-risk children on time. The findings not only point to the importance of advanced data-driven approaches in stunting detection but also lay the ground for future research on machine learning applications in the fight against other malnutrition-related public health challenges, which could be crucial for improving child health and well-being across the world.


Article Metrics

Abstract: 148 Viewers PDF: 138 Viewers

Keywords


Stunting Detection; Machine Learning; XGBoost; Random Forest; Support Vector Machine; k-Nearest Neighbors

Full Text:

PDF


Refbacks

  • There are currently no refbacks.



Barcode

Journal of Applied Data Sciences

ISSN : 2723-6471 (Online)
Organized by : Computer Science and Systems Information Technology, King Abdulaziz University, Kingdom of Saudi Arabia.
Website : http://bright-journal.org/JADS
Email : taqwa@amikompurwokerto.ac.id (principal contact)
    support@bright-journal.org (technical issues)

 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0