TY - CHAP
T1 - A Data-Mining Model for Predicting Low Birth Weight with a High AUC
AU - Hange, Uzapi
AU - Selvaraj, Rajalakshmi
AU - Galani, Malatsi
AU - Letsholo, Keletso
N1 - Publisher Copyright:
© Springer International Publishing AG 2018.
PY - 2018
Y1 - 2018
N2 - Birth weight is a significant determinant of a newborn’s probability of survival. Data-mining models are receiving considerable attention for identifying low birth weight risk factors. However, prediction of actual birth weight values based on the identified risk factors, which can play a significant role in the identification of mothers at the risk of delivering low birth weight infants, remains unsolved. This paper presents a study of data-mining models that predict the actual birth weight, with particular emphasis on achieving a higher area under the receiver operating characteristic (AUC). The prediction is based on birth data from the North Carolina State Center for Health Statistics of 2006. The steps followed to extract meaningful patterns from the data were data selection, handling missing values, handling imbalanced data, model building, feature selection, and model evaluation. Decision trees were used for classifying birth weight and tested on the actual imbalanced dataset and the balanced dataset using synthetic minority oversampling technique (SMOTE). The results highlighted that models built with balanced datasets using the SMOTE algorithm produce a relatively higher AUC compared to models built with imbalanced datasets. The J48 model built with balanced data outperformed REPTree and Random tree with an AUC of 90.3%, and thus it was selected as the best model. In conclusion, the feasibility of using J48 in birth weight prediction would offer the possibility to reduce obstetric-related complications and thus improving the overall obstetric health care.
AB - Birth weight is a significant determinant of a newborn’s probability of survival. Data-mining models are receiving considerable attention for identifying low birth weight risk factors. However, prediction of actual birth weight values based on the identified risk factors, which can play a significant role in the identification of mothers at the risk of delivering low birth weight infants, remains unsolved. This paper presents a study of data-mining models that predict the actual birth weight, with particular emphasis on achieving a higher area under the receiver operating characteristic (AUC). The prediction is based on birth data from the North Carolina State Center for Health Statistics of 2006. The steps followed to extract meaningful patterns from the data were data selection, handling missing values, handling imbalanced data, model building, feature selection, and model evaluation. Decision trees were used for classifying birth weight and tested on the actual imbalanced dataset and the balanced dataset using synthetic minority oversampling technique (SMOTE). The results highlighted that models built with balanced datasets using the SMOTE algorithm produce a relatively higher AUC compared to models built with imbalanced datasets. The J48 model built with balanced data outperformed REPTree and Random tree with an AUC of 90.3%, and thus it was selected as the best model. In conclusion, the feasibility of using J48 in birth weight prediction would offer the possibility to reduce obstetric-related complications and thus improving the overall obstetric health care.
UR - http://www.scopus.com/inward/record.url?scp=85020403611&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85020403611&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-60170-0_8
DO - 10.1007/978-3-319-60170-0_8
M3 - Chapter (peer-reviewed)
AN - SCOPUS:85020403611
SN - 978-3-319-60169-4
VL - 719
T3 - Studies in Computational Intelligence
SP - 109
EP - 121
BT - Computer and Information Science
A2 - Lee, Roger
PB - Springer Nature Switzerland AG
T2 - 16th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2017
Y2 - 24 May 2017 through 26 May 2017
ER -