-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revisiting default parameter settings? #4986
Comments
@thvasilo Seems to be a good read at weekend. ;-) |
I'll ping the main author @PhilippPro in case he wants to chime in on recommended defaults. |
Hey @thvasilo ! I think the defaults in xgboost are not chosen to have the best performance. The user also has to specify There is also an autoxgboost package (https://github.com/ja-thomas/autoxgboost), but it is not working very well, sometimes it provides nonsense results and performance is worse than other auto tune packages such as The benchmark is described here: https://github.com/PhilippPro/tuneRanger/blob/master/benchmark/figure/rsq_results.pdf |
This paper is very interesting thanks @PhilippPro. There are a few issues with directly adopting the default parameters from Table 3. The paper refers to classification datasets only. How confident are we that these parameters are effective for regression/ranking? This actually brings up a very interesting question: are our regularisation parameters invariant to learning objective? For example The default value of 4168 boosting rounds with a low eta is almost certainly better but imposes a much higher computational burden. What if someone goes to test the algorithm and it takes 5 minutes to run? I still think it might be a good idea to use a higher default, but this is just a consideration. What is the role of dataset size in the effectiveness of hyperparameters? I feel like small datasets benefit strongly from regularisation where large datasets often do not seem to benefit at all. @PhilippPro if you are interested in going further with this I would love to adopt your work in xgboost. I think there is potential here to dramatically improve the results for a large portion of our user base. |
Hi @RAMitchell ! I am interested. I agree with you, that the nrounds parameter is too high. I think it would be better to optimize the hyperparameters in a restricted hyperparameter space, e.g. set the maximum nrounds to 300. And yes, regression (can you do ranking in xgboost currently?) is another problem that should be considered. There is also the option to use other defaults for classification and regression, as done for example in several random forest packages. The other thing you mention is the problem of hyperparameters dependent on dataset characteristics. Here you should be very sure about the relationship to set hyperparameters data dependent, e.g. on the number of observations or the number of features p (e.g. in random forest mtry is set to the square root of p). A paper which tries to set these defaults empirically can be seen here, but maybe it is better to specify this rule by "hand" (e.g. humans looking at some plots that show the relationship between dataset characteristics, hyperparameters and performance). I do not know much about theoretical considerations in this field regarding xgboost. @pfistfl are the results of the new bot already usable, could we use this for the purpose here? |
@RAMitchell |
I think 100-500 is more reasonable - the goal would be for the user's application not to hang unreasonably when they first try the library. We have several places in language bindings where default parameters arise, we could start with Python APIs (or whatever your preference is) and worry about the rest later. Another way of approaching this would be for us to provide some kind of dictionary of preset parameters in our API. This gives more flexibility and choice. This is all really about user experience and lowering the barrier to be able to train models effectively. This would go hand in hand with examples and documentation. |
@PhilippPro any update on this? Here is what I propose: rerun with 500 rounds and a couple of regression datasets. Confirm run-time is acceptable. Set these as the default parameters, create a documentation page on parameter tuning with a few notes on methodology, how you arrived at these and linking to your paper. |
I have not forgotten it, but I currently do not have a lot of time. I can only rerun it on the existing datasets, as I have the data for this, but this is not a problem. Your proposition is good, I will follow it, thanks. |
@RAMitchell I got some results now which are not really stunning. ;) I created a blog post, where I describe the results: Currently I am repeating the 5-fold CV to get more stable results and will update the results in the blog post tomorrow. Future work (as described in the blog post) will be a bit more interesting. |
Awesome work thanks! I think there is moderate evidence for changing the default parameters settings. Shall we re-evaluate once you have results from CatBoost/LightGBM? Note that we are mostly focusing new development on "tree_method":"hist" and"tree_method":"gpu_hist" which exposes the "grow_policy":"lossguide"/"depthwise" parameter. One of these growth policies might be definitively better on your datasets. |
Yes, that's fine. The results I got today (with 10 times repeated 5-fold CV) were slightly better for my defaults (75% better in case of Spearman's Rho), I updated the post. I will be happy to evaluate your new parameters once they are readily implemented in the package. I think Catboost will be better in the default mode, but I will see the results soon. |
@RAMitchell how do you feel about the switch to a |
I looked at the graph again and thought a bit about the results. In the section with low R-squared the default of xgboost performs much worse. These are datasets that are hard to fit and few things can be learned. The higher eta (eta=0.1) leads to too much overfitting compared to my defaults (eta=0.05). If you would set nrounds lower than 500 the effect would be smaller. You could leave eta=0.1 but set nrounds to a lower default value. Or leave it like it is, withouth specifying it. I am not sure how big is the effect for max_depth. You could leave it on 5 to get better runtimes. The subsample and colsample parameters could be safely set to values between 0.6 and 0.8, I guess there is not a big danger here. |
I'm not too worried about memory consumption from tree depth. As mentioned by @PhilippPro over fitting is also offset by lower learning rates and sampling. @PhilippPro the "grow_policy":"lossguide"/"depthwise" parameters already exists for both "tree_method":"hist" and"tree_method":"gpu_hist". We may even make "tree_method":"hist" the default learning mode at some point. |
Hello all,
I came upon a recent JMLR paper that examined the "tunability" of the hyperparameters of multiple algorithms, including XGBoost.
Their methodology, as far as I understand it, is to take the default parameters of the package, find the (near) optimal parameters for each dataset in their evaluation and determine how valuable it is to tune a particular parameter.
In doing so they also come up with "optimal defaults" in Table 3, and an interactive Shiny app.
This made me curious about how the defaults for XGBoost were chosen and if it's something that the community would be interested in revisiting in the future.
The text was updated successfully, but these errors were encountered: