Revisiting default parameter settings? #4986

thvasilo · 2019-10-25T09:48:30Z

Hello all,

I came upon a recent JMLR paper that examined the "tunability" of the hyperparameters of multiple algorithms, including XGBoost.

Their methodology, as far as I understand it, is to take the default parameters of the package, find the (near) optimal parameters for each dataset in their evaluation and determine how valuable it is to tune a particular parameter.

In doing so they also come up with "optimal defaults" in Table 3, and an interactive Shiny app.

This made me curious about how the defaults for XGBoost were chosen and if it's something that the community would be interested in revisiting in the future.

trivialfis · 2019-10-25T10:56:11Z

@thvasilo Seems to be a good read at weekend. ;-)

thvasilo · 2019-10-25T11:45:24Z

I'll ping the main author @PhilippPro in case he wants to chime in on recommended defaults.

PhilippPro · 2019-10-28T10:10:01Z

Hey @thvasilo ! I think the defaults in xgboost are not chosen to have the best performance. The user also has to specify nrounds by himself. Probably the defaults are chosen to provide a basic version of gradient boosting. Maybe it would be nice if some good defaults would be at least described in the help section. It is a bit nasty if you want to use a package and have to search the internet to make it work properly.

There is also an autoxgboost package (https://github.com/ja-thomas/autoxgboost), but it is not working very well, sometimes it provides nonsense results and performance is worse than other auto tune packages such as tuneRanger or liquidSVM. See here for a graph of the benchmark that I did on some regression datasets: https://github.com/PhilippPro/tuneRanger/blob/master/benchmark/figure/rsq_results.pdf

The benchmark is described here: https://github.com/PhilippPro/tuneRanger/blob/master/benchmark/figure/rsq_results.pdf

RAMitchell · 2019-11-13T21:55:12Z

This paper is very interesting thanks @PhilippPro. There are a few issues with directly adopting the default parameters from Table 3. The paper refers to classification datasets only. How confident are we that these parameters are effective for regression/ranking? This actually brings up a very interesting question: are our regularisation parameters invariant to learning objective? For example min_child_weight will be much more restrictive in binary classification where the Hessians take on very small values, as compared to squared error regression where the Hessian is a constant.

The default value of 4168 boosting rounds with a low eta is almost certainly better but imposes a much higher computational burden. What if someone goes to test the algorithm and it takes 5 minutes to run? I still think it might be a good idea to use a higher default, but this is just a consideration.

What is the role of dataset size in the effectiveness of hyperparameters? I feel like small datasets benefit strongly from regularisation where large datasets often do not seem to benefit at all.

@PhilippPro if you are interested in going further with this I would love to adopt your work in xgboost. I think there is potential here to dramatically improve the results for a large portion of our user base.

PhilippPro · 2019-11-15T08:46:52Z

Hi @RAMitchell ! I am interested. I agree with you, that the nrounds parameter is too high. I think it would be better to optimize the hyperparameters in a restricted hyperparameter space, e.g. set the maximum nrounds to 300.

And yes, regression (can you do ranking in xgboost currently?) is another problem that should be considered. There is also the option to use other defaults for classification and regression, as done for example in several random forest packages.

The other thing you mention is the problem of hyperparameters dependent on dataset characteristics. Here you should be very sure about the relationship to set hyperparameters data dependent, e.g. on the number of observations or the number of features p (e.g. in random forest mtry is set to the square root of p). A paper which tries to set these defaults empirically can be seen here, but maybe it is better to specify this rule by "hand" (e.g. humans looking at some plots that show the relationship between dataset characteristics, hyperparameters and performance). I do not know much about theoretical considerations in this field regarding xgboost.

@pfistfl are the results of the new bot already usable, could we use this for the purpose here?

PhilippPro · 2019-11-25T12:15:01Z

@RAMitchell
I could rerun the results with a restricted nrounds. To what value should I restrict it? Or what would be your idea?

RAMitchell · 2019-11-25T21:48:00Z

I think 100-500 is more reasonable - the goal would be for the user's application not to hang unreasonably when they first try the library. We have several places in language bindings where default parameters arise, we could start with Python APIs (or whatever your preference is) and worry about the rest later.

Another way of approaching this would be for us to provide some kind of dictionary of preset parameters in our API. This gives more flexibility and choice.

This is all really about user experience and lowering the barrier to be able to train models effectively. This would go hand in hand with examples and documentation.

RAMitchell · 2020-02-18T22:11:28Z

@PhilippPro any update on this? Here is what I propose: rerun with 500 rounds and a couple of regression datasets. Confirm run-time is acceptable. Set these as the default parameters, create a documentation page on parameter tuning with a few notes on methodology, how you arrived at these and linking to your paper.

PhilippPro · 2020-02-19T14:33:33Z

@PhilippPro any update on this? Here is what I propose: rerun with 500 rounds and a couple of regression datasets. Confirm run-time is acceptable. Set these as the default parameters, create a documentation page on parameter tuning with a few notes on methodology, how you arrived at these and linking to your paper.

I have not forgotten it, but I currently do not have a lot of time. I can only rerun it on the existing datasets, as I have the data for this, but this is not a problem. Your proposition is good, I will follow it, thanks.

PhilippPro · 2020-02-25T19:19:44Z

@RAMitchell I got some results now which are not really stunning. ;)

I created a blog post, where I describe the results:
New xgboost defaults

Currently I am repeating the 5-fold CV to get more stable results and will update the results in the blog post tomorrow. Future work (as described in the blog post) will be a bit more interesting.

RAMitchell · 2020-02-25T20:48:51Z

Awesome work thanks! I think there is moderate evidence for changing the default parameters settings. Shall we re-evaluate once you have results from CatBoost/LightGBM?

Note that we are mostly focusing new development on "tree_method":"hist" and"tree_method":"gpu_hist" which exposes the "grow_policy":"lossguide"/"depthwise" parameter. One of these growth policies might be definitively better on your datasets.

PhilippPro · 2020-02-26T07:57:37Z

Yes, that's fine. The results I got today (with 10 times repeated 5-fold CV) were slightly better for my defaults (75% better in case of Spearman's Rho), I updated the post.

I will be happy to evaluate your new parameters once they are readily implemented in the package.

I think Catboost will be better in the default mode, but I will see the results soon.
The problem for xgboost is also, that it cannot treat categorical parameters, so I made (automatically) dummy-variables out of these variables.

thvasilo · 2020-02-26T19:29:20Z

@RAMitchell how do you feel about the switch to a max_depth of 11 which the blog post suggests? Wouldn't that risk very large memory consumption? Would overfitting be an issue or is that taken care of early stopping/loss guide?

PhilippPro · 2020-02-27T10:44:53Z

I looked at the graph again and thought a bit about the results. In the section with low R-squared the default of xgboost performs much worse. These are datasets that are hard to fit and few things can be learned. The higher eta (eta=0.1) leads to too much overfitting compared to my defaults (eta=0.05). If you would set nrounds lower than 500 the effect would be smaller. You could leave eta=0.1 but set nrounds to a lower default value. Or leave it like it is, withouth specifying it.

I am not sure how big is the effect for max_depth. You could leave it on 5 to get better runtimes.

The subsample and colsample parameters could be safely set to values between 0.6 and 0.8, I guess there is not a big danger here.

RAMitchell · 2020-03-16T02:40:53Z

I'm not too worried about memory consumption from tree depth. As mentioned by @PhilippPro over fitting is also offset by lower learning rates and sampling.

@PhilippPro the "grow_policy":"lossguide"/"depthwise" parameters already exists for both "tree_method":"hist" and"tree_method":"gpu_hist". We may even make "tree_method":"hist" the default learning mode at some point.

StrikerRUS mentioned this issue Dec 6, 2019

Documentation Request: More information on finding the optimal ranges for parameter tuning? microsoft/LightGBM#2617

Closed

trivialfis added the feature-request label Jan 28, 2020

RAMitchell mentioned this issue Mar 27, 2020

Consistency of min_child_weight parameter #5444

Open

thvasilo mentioned this issue Jun 10, 2020

[Roadmap] 1.2.0 Roadmap #5734

Closed

14 tasks

trivialfis mentioned this issue Aug 19, 2020

[Enhancement] Add default ranges for hyperparameter tuning #6034

Open

trivialfis mentioned this issue Nov 10, 2021

xgb.train parameters default value settings and overfitting issues #7415

Closed

trivialfis mentioned this issue Mar 14, 2023

Accuracy degradation with approx and noisy data. #8901

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisiting default parameter settings? #4986

Revisiting default parameter settings? #4986

thvasilo commented Oct 25, 2019 •

edited

Loading

trivialfis commented Oct 25, 2019

thvasilo commented Oct 25, 2019

PhilippPro commented Oct 28, 2019 •

edited

Loading

RAMitchell commented Nov 13, 2019

PhilippPro commented Nov 15, 2019 •

edited

Loading

PhilippPro commented Nov 25, 2019

RAMitchell commented Nov 25, 2019

RAMitchell commented Feb 18, 2020

PhilippPro commented Feb 19, 2020

PhilippPro commented Feb 25, 2020

RAMitchell commented Feb 25, 2020

PhilippPro commented Feb 26, 2020 •

edited

Loading

thvasilo commented Feb 26, 2020

PhilippPro commented Feb 27, 2020

RAMitchell commented Mar 16, 2020

Revisiting default parameter settings? #4986

Revisiting default parameter settings? #4986

Comments

thvasilo commented Oct 25, 2019 • edited Loading

trivialfis commented Oct 25, 2019

thvasilo commented Oct 25, 2019

PhilippPro commented Oct 28, 2019 • edited Loading

RAMitchell commented Nov 13, 2019

PhilippPro commented Nov 15, 2019 • edited Loading

PhilippPro commented Nov 25, 2019

RAMitchell commented Nov 25, 2019

RAMitchell commented Feb 18, 2020

PhilippPro commented Feb 19, 2020

PhilippPro commented Feb 25, 2020

RAMitchell commented Feb 25, 2020

PhilippPro commented Feb 26, 2020 • edited Loading

thvasilo commented Feb 26, 2020

PhilippPro commented Feb 27, 2020

RAMitchell commented Mar 16, 2020

thvasilo commented Oct 25, 2019 •

edited

Loading

PhilippPro commented Oct 28, 2019 •

edited

Loading

PhilippPro commented Nov 15, 2019 •

edited

Loading

PhilippPro commented Feb 26, 2020 •

edited

Loading