Negative/Near-zero acquisition values for Multi-Fidelity BO #1977
-
Hi, this question builds on discussion #1942 and the resulting PR, #1956. The problem setup is as follows: I have 8 continuous dimensions and 8 discrete data fidelity dimensions. There are a total of 864 possible combinations of values for the discrete data fidelity dimensions. I have adapted code from the discrete multi-fidelity bo tutorial to allow for this setup. In the tutorial,
The The problem I'm experiencing is that the acquisition value here is either close to zero, in the range of 10^(-4) to 10^(-10), or negative values, in the range of 2*10^(-4) to 3*10^(-4). The acquisition value found when Thanks in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
Interesting approach. You may also be interested in two potential alternatives:
Mathematically, the Knowledge gradient should always be nonnegative. It being small isn't a concern per se - the changes in the posterior mean are usually quite small relative to the current value. But negative values are not great. The fact that this isn't happening here is likely due to numerical precision issues or the fact that the KG acquisition function isn't optimized perfectly. This optimization is quite hard even in relatively standard cases, and your setting with many discrete variables should make it a lot harder. One thing you could try to do is to really crank up the budget you spend on optimizing the acquisition function for a single iteration (e.g. by actually enumerating all combinations) to see whether that mitigates this issue of negative values. |
Beta Was this translation helpful? Give feedback.
-
Hi @Balandat Thanks for your timely response! I had to do a bit of a deepdive before I could get back to you on this.
I attempted enumerating all combinations, bumped up the
These options get passed to scipy's implementation of L-BFGS-B. According to the documentation here and the aforementioned stackexchange post, it seems relatively safe to reduce the tolerance to As for the two alternatives you mentioned, both sound interesting (especially the probalistic reparameterization), but I think I will regard them as future work for now while I chase an article deadline. 😅 I should also mention - I am optimising for 96 of the 864 discrete combinations. I do this by first shuffling a list of all combinations and then selecting the next combination for which to construct the Thanks! Alex |
Beta Was this translation helpful? Give feedback.
Yep, that is correct. I just put up #1987 that introduces a mixed alternating optimizer - this needs some cleanup before it can be merged in, but it should work if you check out the PR locally.
cc @saitcakmak, @dme65