An Empirical Model for the Classification of Diabetes and Diabetes_Types Using Ensemble Approaches

Sushma Jaiswal; Priyanka Gupta; L. V. Narasimha Prasad; Rajesh Kulkarni

doi:10.37965/jait.2023.0220

An Empirical Model for the Classification of Diabetes and Diabetes_Types Using Ensemble Approaches

Authors

Sushma Jaiswal Guru Ghasidas Vishwavidyalaya, Koni, Bilaspur, (C.G.), India https://orcid.org/0000-0002-6253-7327
Priyanka Gupta Guru Ghasidas Vishwavidyalaya, Koni, Bilaspur, (C.G.), India https://orcid.org/0000-0001-8643-6857
L. V. Narasimha Prasad Department of CSE, Institute of Aeronautical Engineering, Hyderabad, India https://orcid.org/0000-0001-6514-1064
Rajesh Kulkarni MVSR Engineering College Nadargul, Hyderabad, India https://orcid.org/0000-0002-4113-9987

DOI:

https://doi.org/10.37965/jait.2023.0220

Keywords:

classification, diabetes mellitus, ensemble learning, PID, random forest

Abstract

Diabetes is a hereditary disorder that interferes with human life at all ages. It is challenging for cells to absorb glucose from the bloodstream when an individual has diabetes. The two main subtypes of diabetes are type 1 diabetes and type 2 diabetes. Type 1 diabetes develops when the pancreas cannot make enough insulin, whereas type 2 diabetes spreads due to insulin resistance. Diabetes is a recurrent, and chronic illness that is incurable. In modern healthcare systems, disease detection technology is pervasive. Detecting diabetes in its early stages is crucial for initiating timely treatment and halting disease progression. The proposed method has the potential not only to forecast the likelihood of future diabetes onset but also to identify the specific type of diabetes a person may develop. This paper investigates a potential solution for a diabetes prediction model in light of the continually rising prevalence of diabetes among patients. The proposed framework is designed using two datasets: the Pima Indian dataset, which is used to forecast diabetes, and the DiabetesType dataset, which is used to identify the type of diabetes mellitus an individual has. This research aims to apply machine learning classifiers and ensemble models, such as Bagging, Voting, Averaging, and Stacking, for diabetes prediction. In this context, SMOTE (synthetic minority oversampling technique) and hyperparameter adjustment of the algorithms are considered and have substantially improved the findings. The developed heterogeneous ensemble model offers enhanced prediction rates with different performance criteria. Using the bagging technique, random forest attains a 96% accuracy rate, resulting in better predictions in the PID dataset. Regarding the DiabetesType dataset, the voting ensemble model provides a 98.5% accuracy rate. This study highlights that ensemble learning models are effective in predicting diabetes and can outperform earlier relevant studies.