A Naive Bayes–based explainable machine-learning model predicted preterm birth in pregnant women with both gestational diabetes mellitus and hypertensive disorders of pregnancy, suggesting a potential tool for individualized obstetric risk management, according to a retrospective dual-center study published by Kang et al in Frontiers in Endocrinology.
Researchers developed and externally validated multiple models using electronic medical records from two hospitals in China, focusing on pregnancies complicated by both gestational diabetes mellitus and hypertensive disorders of pregnancy. These conditions frequently coexist and increase the risk of preterm birth, they noted, yet predictive tools tailored to this high-risk group have been limited.
“Among the evaluated algorithms, the Naive Bayes classifier demonstrated the most favorable balance across discrimination, reclassification, interpretability, and robustness, and was ultimately selected as the optimal model for clinical application,” the researchers commented. “The proposed Naive Bayes model may assist clinicians in early identification and personalized risk management of high-risk pregnancies affected by GDM [gestational diabetes mellitus] and HDP [hypertensive disorders of pregnancy], and represents a step toward the implementation of transparent, evidence-based decision support in obstetric practice.”
Study Details
This study analyzed electronic medical records from 257 pregnant women diagnosed with comorbid gestational diabetes mellitus and hypertensive disorders of pregnancy. The development cohort included 121 patients from Sichuan Provincial People’s Hospital, Chengdu, of whom 31 (26%) experienced preterm birth and 90 (74%) had nonpreterm birth. An external validation cohort comprised 136 patients from Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, including 24 (18%) and 112 (82%) preterm and nonpreterm births, respectively.
Multiple machine-learning algorithms were used to construct predictive models, including Least Absolute Shrinkage and Selection Operator (LASSO) regression, Random Forest, and Naive Bayes. To address class imbalance and improve model robustness, the researchers applied the Synthetic Minority Over-sampling Technique (SMOTE), which generates synthetic samples for the minority class. Shapley Additive Explanations analysis was conducted to further assess model interpretability.
Key Findings
Thirteen variables with univariate significance were entered into Elastic Net regression, resulting in five predictors—alanine transaminase, aspartate transaminase, albumin, lactate dehydrogenase, and systolic blood pressure at 32 to 36 weeks—which were incorporated into model development.
“These variables capture distinct domains relevant to preterm labor pathophysiology, including hepatic dysfunction, systemic inflammation, vascular insufficiency, and hemodynamic instability,” the researchers explained.
Although the LASSO model achieved the highest area under the receiver operating characteristic curve (AUC = 0.802), the researchers reported that the Naive Bayes model showed greater clinical net benefit, higher reclassification performance as measured by the Net Reclassification Improvement and Integrated Discrimination Improvement metrics, and greater robustness in SMOTE-based sensitivity analyses. In the external validation cohort (n = 136), it was found to maintain “strong” generalization, with an AUC of 0.777, accuracy of 0.801, sensitivity of 0.792, and specificity of 0.804, and was therefore selected for use in this high-risk population.
“Future studies should aim to validate this model in larger, multicenter cohorts and explore its integration into real-time clinical decision support systems,” the researchers concluded.
Disclosure: The study was funded by a grant from the Key Research Project of Science and Technology of Sichuan Province. The study researchers reported no conflicts of interest.
Source: Frontiers in Endocrinology