A diabetes prediction dataset from Kaggle (url=https://www.kaggle.com/datasets/iammustafatz/diabetes-prediction-dataset) was analyzed and visualized using various statistical methods and matplotlib.pyplot.
The relationships of the variables of ID, gender, age, smoking history, BMI, HbA1c level, blood glucose level, diabetes diagnosis, hypertension, and heart disease were correlated and compared to see which factors influence the development of diabetes, hypertension, and heart disease, and to see which factors are most accurate at predicting diabetes.
Various statistical measures were used including hypothesis testing and correlation coefficient testing. Several different kinds of visualizations were created including bar plots, histograms, scatter plots, pie charts, and line charts.
Through statistical analysis and visualizations, we can conclude that BMI, HbA1c levels, and blood glucose levels positively correlate with diabetes diagnosis, and BMI positively correlates with hypertension and heart disease.
The code was sourced by Lauren Ables-Torres, Paulette Petracco, Holt Jones, and edX Bootcamps, LLC.