Does BoTorch support BO with gradient observations? #1626

floatingCatty · 2023-01-10T07:28:14Z

I am curious that if BoTorch support the BO when gradient of the objective function is available.
Like the method describe in "Bayesian Optimization with Gradients".
Thx.

saitcakmak · 2023-01-10T21:54:50Z

Hi @floatingCatty. We do not have a BoTorch model that supports gradient observations out of the box. You would have to implement a custom model that incorporates gradient observations. Here's a GPyTorch tutorial on this. Once you have the model, you can use the rest of BoTorch to APIs to optimize your function. You will want to use an objective to exclude the gradient predictions while computing the acquisition values.

Balandat · 2023-01-12T02:07:11Z

Here is a demo notebook for this. Note that for the single-output analytic acquisition function using a ScalarizedObjective is deprecated and so I'm using a ScalarizedPosteriorTransform instead.

demo_deriv_enabled_BO_logEI.ipynb.txt

saitcakmak · 2023-01-12T20:26:35Z

@Balandat Should we polish that up and add it as a tutorial?

Balandat · 2023-01-12T20:31:53Z

That's an idea. I haven't actually thought about whether this is a good problem for BO or anything, but worth checking that we can get some meaningful results if we run a loop here.

floatingCatty · 2023-01-13T05:27:38Z

Thanks, @Balandat @saitcakmak, I tried the examples and it works really well. I test the BO method with gradient on ackley function also, to check if it can perform well when the dimension is increased. It looks good when in 10d, but to 50-100d, the method is still very slow, comparing to the gradient method. Could we utilize gradient information with BO to perform optimization in high dim problems?

Balandat · 2023-01-13T05:56:28Z

It looks good when in 10d, but to 50-100d, the method is still very slow, comparing to the gradient method

Sorry, what exactly do you mean by the "gradient method" here?

In general we should expect scalability issues with this approach, at least in its vanilla form. For a d-dimensional domain with n observations the train-train covariance will be of size n(d+1) x n(d+1), rather than n x n. Now this matrix does have structure though, and there are more efficient ways to handle this (https://proceedings.mlr.press/v162/ament22a/ament22a.pdf - cc @SebastianAment).

Another ad hoc way of dealing with the poor scaling is to only include some partial derivatives or a directional derivative as discussed in https://arxiv.org/pdf/1703.04389.pdf

floatingCatty · 2023-01-13T07:44:34Z

By “Gradient Method”, I mean gradient-based optimization methods like Stochastic Gradient Decent where it can jump out of local minimum to some degree.

I am working on a parameter size ranging from 10 to 1000 range, where gradient information is known, but the converged result of the gradient optimizer is not accurate enough for the problem. While pure black-box global optimization is too expensive to apply, I am finding some gradient-utilized global optimization methods to help alleviate this. Is Bo with the gradient method suitable in this case?

SebastianAment · 2023-01-13T09:06:50Z

Hi @floatingCatty, it'd be important to understand how expensive (e.g. in terms of time) your objective function is to evaluate.

If the objective is cheap to evaluate, you should be able to make progress by restarting a regular gradient-based optimizer with random starting points and taking the best of all converged results, see for example the results of L-BFGS-R in Figure 5 here. While it might need thousands of evaluations to achieve similar values as first-order BO on high-dimensional Ackley, you can make progress over a single optimization run using L-BFGS with random restarts upon convergence.

If the objective is expensive to evaluate, a first-order BO approach would be particularly relevant. It's on my list to get the more scalable approach into BoTorch, which will allow us to scale to higher d. Scaling to 1000 without powerful hardware will likely remain a challenge, though I would love to be wrong on that point!

floatingCatty added the enhancement New feature or request label Jan 10, 2023

yyexela mentioned this issue Feb 15, 2023

[Feature Request] Derivative Enabled GPs #1679

Open

esantorella changed the title ~~[Feature Request] Actually a question..~~ Does BoTorch support BO with gradient observations? Apr 22, 2023

esantorella closed this as not planned Won't fix, can't repro, duplicate, stale Jun 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does BoTorch support BO with gradient observations? #1626

Does BoTorch support BO with gradient observations? #1626

floatingCatty commented Jan 10, 2023

saitcakmak commented Jan 10, 2023 •

edited

Loading

Balandat commented Jan 12, 2023

saitcakmak commented Jan 12, 2023

Balandat commented Jan 12, 2023

floatingCatty commented Jan 13, 2023

Balandat commented Jan 13, 2023

floatingCatty commented Jan 13, 2023

SebastianAment commented Jan 13, 2023

Does BoTorch support BO with gradient observations? #1626

Does BoTorch support BO with gradient observations? #1626

Comments

floatingCatty commented Jan 10, 2023

saitcakmak commented Jan 10, 2023 • edited Loading

Balandat commented Jan 12, 2023

saitcakmak commented Jan 12, 2023

Balandat commented Jan 12, 2023

floatingCatty commented Jan 13, 2023

Balandat commented Jan 13, 2023

floatingCatty commented Jan 13, 2023

SebastianAment commented Jan 13, 2023

saitcakmak commented Jan 10, 2023 •

edited

Loading