Skip to content

Technologia-X/Hpo-ML-GPU

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HYPERPARAMETER OPTIMIZATION USING RAYTUNE AND RAPIDS

Header

Hyperparameter optimization is a method used to enhance the accuracy of a model. Hyperparameter tuning can make the difference between an average model and a highly accurate one. The goal of this model was to predict the value of a football player by using random forest on gpu and optimized the accuray of the prediciton based on features such as rating, skill rate, work rate, attacking rate, position and etc.

Link to view the Data

Rapids is a suite of open source software libraries and APIs gives you the ability to execute end-to-end data science and analytics pipelines entirely on GPUs. Imagine scikit-learn on steroids, that is rapids.

Without HPO Random Forest -> No-HPO (Training on GPU)

Without HPO Random Forest

Running default setting on Random Forest

n_estimator = 100, max_depth = 16, max_bins = 8

Accuracy -> R2 score = 75%

Goal is to tune the model and enhance the accuracy without doing any feature engineering

Hence comes the ray tune module into the picture. Ray.Tune

HPO-code

Ray Tune Config

number of samples = 10, number of folds = 3, range for n_estimators = 500 - 1500, range for max_depth = 10 - 20, range for max_features = 0.5 - 1.0, n_bins = 18

Configuration that is done here are restricted as the machine used for this experiment is only running on GTX1060. If you have more powerful GPU, then you may have wider range of configuration to test.

Ray will randomly select any value from the range and add into the model as hyperparameter

self.rf_model = curfc(
                n_estimators=self._model_params["n_estimators"],
                max_depth=self._model_params["max_depth"],
                n_bins=self._model_params["n_bins"],
                max_features=self._model_params["max_features"],
            )

Total of 300 samples executed, but some iterations stopped early due to early stopping conditions.

+---------------------+------------+-------+-------------+----------------+----------------+--------+------------------+
| Trial name          | status     | loc   |   max_depth |   max_features |   n_estimators |   iter |   total time (s) |
|---------------------+------------+-------+-------------+----------------+----------------+--------+------------------|
| WrappedTrainable_1  | TERMINATED |       |     13.7454 |       0.975357 |       1231.99  |      3 |         234.135  |
| WrappedTrainable_2  | TERMINATED |       |     15.9866 |       0.578009 |        655.995 |      1 |          35.5271 |
| WrappedTrainable_3  | TERMINATED |       |     10.5808 |       0.933088 |       1101.12  |      1 |          58.8539 |
| WrappedTrainable_4  | TERMINATED |       |     17.0807 |       0.510292 |       1469.91  |      1 |          98.2842 |
| WrappedTrainable_5  | TERMINATED |       |     18.3244 |       0.60617  |        681.825 |      3 |         180.687  |
| WrappedTrainable_6  | TERMINATED |       |     11.834  |       0.652121 |       1024.76  |      3 |         124.095  |
| WrappedTrainable_7  | TERMINATED |       |     14.3195 |       0.645615 |       1111.85  |      3 |         149.505  |
| WrappedTrainable_8  | TERMINATED |       |     11.3949 |       0.646072 |        866.362 |      1 |          36.0093 |
| WrappedTrainable_9  | TERMINATED |       |     14.5607 |       0.892588 |        699.674 |      1 |          43.3045 |
| WrappedTrainable_10 | TERMINATED |       |     15.1423 |       0.796207 |        546.45  |      3 |         112.048  |
+---------------------+------------+-------+-------------+----------------+----------------+--------+------------------+

Output for all parameters are stored in trials.csv

The best performing parameters were experiment number 6:

max_depth=11, max_features=0.6, n_estimators=1024

With these hyperparameters the model accuracy increased to 83%. As you can see by finding better parameters we can make the model more accurate, now all is left is to work on feature engineering and re run the HPO to increase the accuracy more.

Those interested to try and run this

  1. Install rapids
  2. Install ray pip install 'ray[tune]' torch torchvision
  3. Clone this repo
  4. Run python/python3 random_forest_hpo.py

References

  1. rapids example
  2. medium
  3. ray
  4. cuml

Contributor

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages