Научная статья

Machine learning models for the prediction of uterine fibroids

Переводим название...
Medicine

In this cross-sectional study, we developed and validated a predictive model for uterine fibroid risk using routine physical examination indicators and 5 machine learning algorithms: logistic regression, random forest, k-nearest neighbors, categorical boosting (CatBoost), and light gradient boosting machine. The primary dataset consisted of health examination records from the MJ Health Screening Center in Beijing, China (2013-2023), while an independent external validation dataset (2024) was used to assess generalizability. LASSO regression identified 13 significant predictors, including age, body mass index, total cholesterol, diastolic blood pressure, and marital status. Among the models, CatBoost demonstrated the best performance, achieving an area under the curve of 0.808 in the internal validation dataset and 0.821 in the external validation dataset, indicating strong predictive capability and robustness. SHapley additive exPlanations analysis revealed that age and body mass index were the most critical predictors, and that total cholesterol was a key predictive feature; its implications for lipid metabolism are further discussed in the main text. Despite its strengths in area under the curve, specificity, and sensitivity, the model exhibited limitations in precision (0.475) and moderate accuracy (0.742), indicating challenges in controlling false-positive rates. The results indicate that the model is a potentially effective screening tool for identifying high-risk individuals who may benefit from further diagnostic evaluation. While this study validates the feasibility of using routine health examination data combined with the CatBoost algorithm for early risk assessment of uterine fibroids, it also highlights the need for cautious interpretation of the model's predictions in clinical practice. Future research should focus on multicenter, large-scale studies to enhance the model's generalizability and incorporate additional predictive factors to optimize performance.

Переводим аннотацию...