Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Algorithm configuration for a small config space - too few evaluations #1160

Closed
graidl opened this issue Nov 15, 2024 · 7 comments
Assignees

Comments

@graidl
Copy link

graidl commented Nov 15, 2024

We intend to use SMAC to optimize only few discrete parameters with small domains of a stochastic metaheuristic using the AlgorithmConfigurationFacade. Our issue is that SMAC stops far too early.

To be a bit more specific:

Let the config space bet

config_space = ConfigurationSpace({
        "y": (1, 3), 
        "z":["opt1", "opt2"],
    })

Thus, we have just 3*2=6 possible configurations. Moreover, let us use just one instance.

SMAC stops already after about 26 algorithm evaluations with the message Could not return a new configuration after 16 retries, despite we set n_trials to 2000 and deterministic to False, and the metaheuristic returned quite different values for the same configurations.

Even though the search space for SMAC is very small, clearly a larger number of evaluations would be needed to conclude with some statistical significance that one configuration is best.

What am I missing?

AlgorithmConfigurationFacade sets max_config_calls in the intensifier to 2000, so this cannot be the issue.

Note also that the table in https://automl.github.io/SMAC3/main/3_getting_started.html says that the AlgorithmConfigurationFacade uses the Hyperband Intensifier, which, however I don't think is true.

@timruhkopf
Copy link
Collaborator

timruhkopf commented Nov 20, 2024

Hi, thank you for your issue!

Could you maybe please give us a minimal working example to reproduce the problem that you are facing?
From what i gather, this should be the line from the log

Regarding your note: you are absolutely right, and i will open a documentation issue #1165 to update our previous oversight.

@timruhkopf
Copy link
Collaborator

looking into the issue a bit more, there are two reasons for why the config ended up in the ConfigSelector._processed_configs and thus is not new to it:

  1. you added the config to the runhistory before starting the execution
  2. the initial design already is exhaustive wrt your tiny search space; in which case you don't need to evaluate the configuration anymore.

both of which will increase the counter of retries for pre-existing configurations.

if you only do have so few design decisions in your search space, why don't you just enumerate them. Are you investigating seeds or instances?

@graidl
Copy link
Author

graidl commented Nov 20, 2024

Thanks for your answers. Here is a small demo code that stops after about 20 evaluations instead of the indicated 200 trials:

from ConfigSpace import Configuration, ConfigurationSpace
from smac import AlgorithmConfigurationFacade, Scenario
import random

config_space = ConfigurationSpace({
        "x": (1, 3), 
        "y": (1, 2),
    })

instances = ["myinst.txt"]
features = {fn: [i] for (i, fn) in enumerate(instances)}

scenario = Scenario(config_space, deterministic=False, 
                    instances=instances, instance_features=features, 
                    n_trials=200)

def f(config: Configuration, instance: str, seed: int) -> float:
    x = int(config["x"]); y = int(config["y"])
    print(f'f({instance}, {seed}, {x}, {y})', end=" -> ")
    res = x - y + random.random()
    print(res)
    return res
    
smac = AlgorithmConfigurationFacade(scenario, f, overwrite=True)

incumbent = smac.optimize()
print("Optimized configuration: ", incumbent)

The point is that the function `f`` for which the parameters should be optimized is not deterministic, which we also indicate in the scenario.

Yes, enumerating all possible configurations is feasible, in this simple example there are just 3*2=6. However, each of them needs to be evaluated a serious number of times to be able to decide that one is (statistically) better than another.
SMAC seems to consider a configuration "fully processed" just after a very low number of evaluations with different seeds, maybe 3? This is clearly far to less for any statistical significance. I would have expected a parameter to set a higher number of seeds to be tried but did not find a solution to this.

Let me finally just explain that in our real application, we indeed also just want to optimize two discrete parameters although with larger domains. And doing a grid search by enumerating all configurations and doing e.g. 30 evaluations is what we wanted to avoid since each run takes quite some time.

@timruhkopf
Copy link
Collaborator

Thanks for providing us with the minimal viable example. We will have a look into it. But from what you describe this might likely originate from the way the intensification works.

@mwever raised the issue, that for classic AlgorithmConfiguration, we usually assume a weak homogeneity; which makes it easier to estimate the performance and take action based on that more confidently. What you describe on the other hand basically implies that you have quite overlapping performance distributions and there is no clear winner unless we have a very good (expensively acquired on many seeds) performance estimate. Smac is intended to aggressively reject, but be gracious, should we re-encounter a configuration.

I have another conceptual issue, but first i want to verify that the seeding is not implicitly creating a new instance. Then we can move forward and assume that the seeds are independent draws from the performance distribution of that algorithm. If you then are racing algorithms against each other wrt the number of seeds, and say you used the mean of the performance distribution to make such a decision, then you are comparing estimators that have a varying amount of confidence about them I am not sure right now if that is already implemented/intended in that way in smac.

@graidl
Copy link
Author

graidl commented Nov 28, 2024

Thank you for considering this question. No, seeding is not creating a new instance, I clearly separate between different instances and different seeds. Not using SMAC, I would run the different algorithm configurations on our instances and do, e.g., a pairwise Wilcoxon signed rank tests to get an idea how reliably one configuration may be better than another one.

I think such circumstances are not that uncommon in tuning Algorithm configurations, but also understand that SMAC may possibly not cover all such circumstances of usage. Then it would be good to know its limits.

A bit of help would already be when one can provide a (minimum) number of seeds that need to be tried for a configuration so that SMAC assumes to have a reliable performance estimate for one instance. To me it seems that this number is by default extremely low (~3?) and cannot be set easily.

@mwever
Copy link

mwever commented Dec 2, 2024

Thanks for your patience and additional explanation. I believe I understand now your setting now and what you are asking for. In the current version of SMAC, the behavior that you are asking for is not supported. Sampling additional seeds to allow for a more robust estimation of the configurations' mean performance on a particular instance is done within the intensification mechanism of SMAC.

The original version of this mechanism, which is implemented, is described in Procedure 2 on page 5 (top): https://ml.informatik.uni-freiburg.de/wp-content/uploads/papers/11-LION5-SMAC.pdf

There you can see that with every new challenging configuration attempting to compete with the incumbent allows the incumbent to sample another instance-seed pair. Hence, what you describe as a requirement is not too easy to be implemented here.

However, if you know that you need at least 300 seeds to draw any reliable conclusion, as a workaround you could also incorporate this into you train function running for 300 seeds everytime an evaluation of a config/seed pair is requested?

Your approach applying a Wilcoxon signed rank test sounds quite interesting, though. We would highly appreciate a PR with such an intensification procedure as a contribution to SMAC!

@graidl
Copy link
Author

graidl commented Dec 2, 2024

Thank you for the clarification. You are right, I can go with your suggestion to actually run for a larger number of seeds every time an evaluation of a config/seed pair is requested.

@graidl graidl closed this as completed Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants