Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does BoTorch support BO with gradient observations? #1626

Closed
floatingCatty opened this issue Jan 10, 2023 · 8 comments
Closed

Does BoTorch support BO with gradient observations? #1626

floatingCatty opened this issue Jan 10, 2023 · 8 comments
Labels
enhancement New feature or request

Comments

@floatingCatty
Copy link

I am curious that if BoTorch support the BO when gradient of the objective function is available.
Like the method describe in "Bayesian Optimization with Gradients".
Thx.

@floatingCatty floatingCatty added the enhancement New feature or request label Jan 10, 2023
@saitcakmak
Copy link
Contributor

saitcakmak commented Jan 10, 2023

Hi @floatingCatty. We do not have a BoTorch model that supports gradient observations out of the box. You would have to implement a custom model that incorporates gradient observations. Here's a GPyTorch tutorial on this. Once you have the model, you can use the rest of BoTorch to APIs to optimize your function. You will want to use an objective to exclude the gradient predictions while computing the acquisition values.

@Balandat
Copy link
Contributor

Here is a demo notebook for this. Note that for the single-output analytic acquisition function using a ScalarizedObjective is deprecated and so I'm using a ScalarizedPosteriorTransform instead.

demo_deriv_enabled_BO_logEI.ipynb.txt

@saitcakmak
Copy link
Contributor

@Balandat Should we polish that up and add it as a tutorial?

@Balandat
Copy link
Contributor

That's an idea. I haven't actually thought about whether this is a good problem for BO or anything, but worth checking that we can get some meaningful results if we run a loop here.

@floatingCatty
Copy link
Author

Thanks, @Balandat @saitcakmak, I tried the examples and it works really well. I test the BO method with gradient on ackley function also, to check if it can perform well when the dimension is increased. It looks good when in 10d, but to 50-100d, the method is still very slow, comparing to the gradient method. Could we utilize gradient information with BO to perform optimization in high dim problems?

@Balandat
Copy link
Contributor

It looks good when in 10d, but to 50-100d, the method is still very slow, comparing to the gradient method

Sorry, what exactly do you mean by the "gradient method" here?

In general we should expect scalability issues with this approach, at least in its vanilla form. For a d-dimensional domain with n observations the train-train covariance will be of size n(d+1) x n(d+1), rather than n x n. Now this matrix does have structure though, and there are more efficient ways to handle this (https://proceedings.mlr.press/v162/ament22a/ament22a.pdf - cc @SebastianAment).

Another ad hoc way of dealing with the poor scaling is to only include some partial derivatives or a directional derivative as discussed in https://arxiv.org/pdf/1703.04389.pdf

@floatingCatty
Copy link
Author

By “Gradient Method”, I mean gradient-based optimization methods like Stochastic Gradient Decent where it can jump out of local minimum to some degree.

I am working on a parameter size ranging from 10 to 1000 range, where gradient information is known, but the converged result of the gradient optimizer is not accurate enough for the problem. While pure black-box global optimization is too expensive to apply, I am finding some gradient-utilized global optimization methods to help alleviate this. Is Bo with the gradient method suitable in this case?

@SebastianAment
Copy link
Contributor

Hi @floatingCatty, it'd be important to understand how expensive (e.g. in terms of time) your objective function is to evaluate.

If the objective is cheap to evaluate, you should be able to make progress by restarting a regular gradient-based optimizer with random starting points and taking the best of all converged results, see for example the results of L-BFGS-R in Figure 5 here. While it might need thousands of evaluations to achieve similar values as first-order BO on high-dimensional Ackley, you can make progress over a single optimization run using L-BFGS with random restarts upon convergence.

If the objective is expensive to evaluate, a first-order BO approach would be particularly relevant. It's on my list to get the more scalable approach into BoTorch, which will allow us to scale to higher d. Scaling to 1000 without powerful hardware will likely remain a challenge, though I would love to be wrong on that point!

@esantorella esantorella changed the title [Feature Request] Actually a question.. Does BoTorch support BO with gradient observations? Apr 22, 2023
@esantorella esantorella closed this as not planned Won't fix, can't repro, duplicate, stale Jun 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants