-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R] Make remaining parameters formal arguments to xgboost()
#11109
base: master
Are you sure you want to change the base?
Conversation
Awesome! I like the order of the arguments. Having dozends of arguments should be okay. H2o also lists them all: https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/xgboost.html |
Thank you for working on the UX, is it possible to use some techniques from roxygen https://cran.r-project.org/web/packages/roxygen2/vignettes/reuse.html to reduce the amount of duplication? |
It is reusing most of the docs from |
xgboost()
xgboost()
#' can only be used with classification objectives and vice-versa. | ||
#' | ||
#' Note that not all possible `objective` values supported by the core XGBoost library are allowed | ||
#' here - for example, objectives which are a variation of another but with a different default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#' here - for example, objectives which are a variation of another but with a different default | |
#' by the [xgboost()] function - for example, objectives which are a variation of another but with a different default |
maybe mention the xgb.train()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is mentioned at the beginning. I don't think [xgboost()]
would be hepful here, because these are the docs for that same function.
#' - `"survival:aft"`: Accelerated failure time model for censored survival time data. | ||
#' See [Survival Analysis with Accelerated Failure Time](https://xgboost.readthedocs.io/en/latest/tutorials/aft_survival_analysis.html) for details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is aft supported? It requires a lower and upper bound for labels due to censored data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are supported. The data needs to be passed as a Surv
object, which is what most R packages use for survival regression.
#' - `"reg:squarederror"`: regression with squared loss. | ||
#' - `"reg:squaredlogerror"`: regression with squared log loss \eqn{\frac{1}{2}[log(pred + 1) - log(label + 1)]^2}. All input labels are required to be greater than -1. Also, see metric `rmsle` for possible issue with this objective. | ||
#' - `"reg:pseudohubererror"`: regression with Pseudo Huber loss, a twice differentiable alternative to absolute loss. | ||
#' - `"reg:absoluteerror"`: Regression with L1 error. When tree model is used, leaf value is refreshed after tree construction. If used in distributed training, the leaf value is calculated as the mean value from all workers, which is not guaranteed to be optimal. | ||
#' - `"reg:quantileerror"`: Quantile loss, also known as "pinball loss". See later sections for its parameter and [Quantile Regression](https://xgboost.readthedocs.io/en/latest/python/examples/quantile_regression.html#sphx-glr-python-examples-quantile-regression-py) for a worked example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we refer to the [xgb.params]
after the comment on what is NOT supported? It may add an additional click for the user, but managing and updating these types of documents is quite challenging from my perspective. As you have encountered, sooner or later, they rot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added another reference to xgb.params
and listed explicitly the ones that are not supported to be make it easier to update in the future.
@david-cortes Could you please help fix the R linter errors: https://github.com/dmlc/xgboost/actions/runs/12418591191/job/34672101709?pr=11109 ? |
Updated. |
ref #9810
This PR adds the remaining parameters that can be passed to
xgboost()
as function arguments.It selectively omits parameters that are not applicable to
xgboost()
, such as parameters related to learning-to-rank objectives which are not supported by this function, but I'm not entirely sure that I'm not missing any.The docs are auto-copied from
xgb.params
, with some small modifications such as aliased parameters docs being re-written here, as aliases are not supported (just like in the sklearn interface for python).I wasn't entirely sure what'd be the best way to add the parameters here, so I though of the following:
nrounds
, before verbosity and monitoring settings), and a small subset which is also more likely to be changed to appear after verbosity-related settings but before the rest of the parameters....
.This still leaves a function signature with 50+ parameters though, so not sure if perhaps it should omit the less frequent parameters altogether and offer a
...
option; or if it should simply stick to the same order of parameters as in the .rst docs. Would be ideal to hear opinions from @mayer79 and @trivialfis here.