CS5228-G18

Introduction

This project is a part of the CS5228 Machine Learning course at the National University of Singapore. For more details, please visit the Kaggle competition page. The objective of this project is to build a model that accurately predicts the resale prices of HDB (Housing & Development Board) flats in Singapore.

Project Structure

The project is organized into the following directories and files:

.
├── data
│   ├── auxiliary-data
│   ├── auxiliary-data-preprocessed
│   ├── test_cleaned.csv
│   ├── test.csv
│   ├── train_cleaned.csv
│   └── train.csv
├── data_preprocess.ipynb
├── EDA.ipynb
├── main.py
├── README.md
└── result
    └── submission.csv

Directories and Files

data: Contains the raw and preprocessed data files.
- auxiliary-data: Contains additional data files.
- auxiliary-data-preprocessed: Contains preprocessed auxiliary data files.
- test_cleaned.csv: Cleaned test data file.
- test.csv: Raw test data file.
- train_cleaned.csv: Cleaned train data file.
- train.csv: Raw train data file.
data_preprocess.ipynb: Jupyter notebook for data preprocessing.
EDA.ipynb: Jupyter notebook for Exploratory Data Analysis (EDA).
main.py: Main Python script containing the implementation of machine learning models.
README.md: This file, providing an overview of the project.
result: Contains the output results of the predictions.
- submission.csv: Sample prediction output file.

Getting Started

Clone the repository to your local machine. And download the data files from the Kaggle competition page.
Install the required Python packages by running pip install -r requirements.txt (if provided) or install the necessary packages individually (e.g., pandas, scikit-learn, xgboost, lightgbm, optuna, etc.).
Open the data_preprocess.ipynb and EDA.ipynb notebooks in Jupyter to view the data preprocessing and exploratory data analysis steps, respectively.
Run the main.py script to execute the machine learning models and generate predictions.
Visualize the results using optuna-dashboard. Details are shown in the Visualization subsection.

Models

The project explores various machine learning models, including:

Linear Regression
LightGBM
Gradient Boosting Tree
Random Forest (with GPU support)
XGBRegressor (with GPU support)

Evaluation

Models are evaluated using cross-validation, and hyperparameter tuning is performed using GPU-supported frameworks like Optuna. The performance of each model is compared to identify the best performing model for predicting HDB resale prices.

Hyperparameter Tuning Visualization

$ pip install optuna-dashboard
$ optuna-dashboard sqlite:///example-study.db

Please check out the GitHub repository for more details.

Manage Studies	Visualize with Interactive Graphs

Visualization of Our Best Model

Visualize the optimization history.

Visualize high-dimensional parameter relationships.

Visualize individual hyperparameters as slice plot.

Visualize parameter importances.

Visualize empirical distribution function.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS5228-G18

Introduction

Project Structure

Directories and Files

Getting Started

Models

Evaluation

Hyperparameter Tuning Visualization

Visualization of Our Best Model

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
result		result
.gitignore		.gitignore
EDA.ipynb		EDA.ipynb
README.md		README.md
data_preprocess.ipynb		data_preprocess.ipynb
main.py		main.py
requirements.txt		requirements.txt

Luxios22/CS5228-G18

Folders and files

Latest commit

History

Repository files navigation

CS5228-G18

Introduction

Project Structure

Directories and Files

Getting Started

Models

Evaluation

Hyperparameter Tuning Visualization

Visualization of Our Best Model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages