This project aims to predict customer churn for a telecommunications company using various machine learning algorithms. Churn prediction helps businesses identify customers who are likely to leave the service, enabling them to take proactive measures to retain them.
- Project Overview
- Dataset
- Installation
- Data Preprocessing
- Model Training
- Hyperparameter Tuning
- Model Evaluation
- Results
- Conclusion
- Contributing
- License
The dataset used in this project is from IBM and includes various features that help in predicting customer churn. The dataset is divided into training and test sets for model training and evaluation.
To run this project, ensure you have the following libraries installed:
- pandas
- numpy
- scikit-learn
- xgboost
- lightgbm
- matplotlib
The data preprocessing steps include:
- Handling missing values
- Encoding categorical variables
- Splitting the data into training and test sets
We trained four machine learning models:
- Logistic Regression
- Random Forest
- XGBoost
- LightGBM
Hyperparameter tuning was performed using GridSearchCV to optimize model performance.
The models were evaluated using metrics such as accuracy, precision, recall, and F1 score. Additionally, ROC curves were plotted to compare the performance of the models.
The Logistic Regression model achieved the best overall performance. The ROC curves for the models provided a visual comparison of their ability to distinguish between churned and non-churned customers.
This project successfully built and evaluated multiple machine learning models to predict customer churn. The Logistic Regression model provided the best balance between precision and recall, making it a reliable choice for deployment in real-world applications.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License. See the LICENSE file for more details.