Analyzing the biggest property data of Bangladesh and predicting the price using ML models.
Check out the Price prediction app
The purpose of this project is to analyze the property condition of Bangladesh and build an ML model to predict the price.
The dataset that was used in this project is publicly available in kaggle. Here's the direct link to the dataset.
Note: The two csv files that are inside the ../data..
directory are preprocessed.
Workflow contains data analysis, feature engineering, preprocessing and training ML models.
Used pandas to wrangle and processing the data for analysis and matplotlib and seaborn for visualization (I kinda like seaborn)Scikit-learn is used to to do further preprocessing and training different ML models.
The dataset contains 7k+ property information of two big cities of Bangladesh, Dhaka and Chattogram. There are three types of properties, Apartment, Building and Duplex having most of being the apartment type (expected) and making the dataset an imbalance one.
One of the key findings is that the Area feature which describes the square feet area of the property is the most correlated feature with the target (Price) variable. Check out the bd_house_price_prediction.ipynb file for the analysis part.
Tried a few regression model to train on the data. Used LinearRegression, DecisionTreeRegressor, RandomForestRegressor, Lasso, Elastic net etc. Also Used Xtreme gradient Boosing (xgb) algorithm, Light Gradient Boosting (lgb) algorithm. The RandomForestRegressor, xgb, lgb performed well while training. Also, I've also tried hyperparameter tuning on those models which I thought performed well. check out the model_training.ipynb file for the training part.
I've also made an web app using streamlit and deployed in streamlit. Do check out the app
The model is not well optimized. It is a work in progress. So, please do comment your feedback or suggestion. I'm also up for collaboration, so don't hesitate to make pull request.