Title: |
Authors:
|
Abstract: This study aims to predict diabetes status (focused on Type 2 diabetes) using a dataset downloaded from the UCI machine learning repository. Diabetes is a major public health problem worldwide and early diagnosis and determination of risk factors are critical for the management of the disease. In this study, 8 machine learning algorithms (Logistic Regression, Naive Bayes, Decision Trees, Random Forest, AdaBoost, Gradient Boosting, Extra Trees, XGBoost) were compared using health indicators and demographic information obtained from real people. The performance of the models was evaluated using Accuracy, Balanced Accuracy, Precision, Recall, F1-Score and ROC AUC metrics. In addition, the factors that most affect the risk of diabetes were determined using the Random Forest algorithm and the performance of the models was re-evaluated. The results showed that Gradient Boosting and XGBoost algorithms could predict diabetes status with a high performance of 86%. The most important risk factors were found to be high blood pressure (HighBP), body mass index (BMI), general health perception (GenHlth) and age (Age). These findings provide an opportunity for early diagnosis by estimating the probability of people developing diabetes. DOI: http://dx.doi.org/10.51505/ijaemr.2025.1115 |
PDF Download |