-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does BoTorch support BO with gradient observations? #1626
Comments
Hi @floatingCatty. We do not have a BoTorch model that supports gradient observations out of the box. You would have to implement a custom model that incorporates gradient observations. Here's a GPyTorch tutorial on this. Once you have the model, you can use the rest of BoTorch to APIs to optimize your function. You will want to use an objective to exclude the gradient predictions while computing the acquisition values. |
Here is a demo notebook for this. Note that for the single-output analytic acquisition function using a |
@Balandat Should we polish that up and add it as a tutorial? |
That's an idea. I haven't actually thought about whether this is a good problem for BO or anything, but worth checking that we can get some meaningful results if we run a loop here. |
Thanks, @Balandat @saitcakmak, I tried the examples and it works really well. I test the BO method with gradient on ackley function also, to check if it can perform well when the dimension is increased. It looks good when in 10d, but to 50-100d, the method is still very slow, comparing to the gradient method. Could we utilize gradient information with BO to perform optimization in high dim problems? |
Sorry, what exactly do you mean by the "gradient method" here? In general we should expect scalability issues with this approach, at least in its vanilla form. For a Another ad hoc way of dealing with the poor scaling is to only include some partial derivatives or a directional derivative as discussed in https://arxiv.org/pdf/1703.04389.pdf |
By “Gradient Method”, I mean gradient-based optimization methods like Stochastic Gradient Decent where it can jump out of local minimum to some degree. I am working on a parameter size ranging from 10 to 1000 range, where gradient information is known, but the converged result of the gradient optimizer is not accurate enough for the problem. While pure black-box global optimization is too expensive to apply, I am finding some gradient-utilized global optimization methods to help alleviate this. Is Bo with the gradient method suitable in this case? |
Hi @floatingCatty, it'd be important to understand how expensive (e.g. in terms of time) your objective function is to evaluate. If the objective is cheap to evaluate, you should be able to make progress by restarting a regular gradient-based optimizer with random starting points and taking the best of all converged results, see for example the results of L-BFGS-R in Figure 5 here. While it might need thousands of evaluations to achieve similar values as first-order BO on high-dimensional Ackley, you can make progress over a single optimization run using L-BFGS with random restarts upon convergence. If the objective is expensive to evaluate, a first-order BO approach would be particularly relevant. It's on my list to get the more scalable approach into BoTorch, which will allow us to scale to higher |
I am curious that if BoTorch support the BO when gradient of the objective function is available.
Like the method describe in "Bayesian Optimization with Gradients".
Thx.
The text was updated successfully, but these errors were encountered: