You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues and checked the recent builds/commits
What would your feature do ?
Currently, refiner switchover is controlled by a fraction of the generation process. So, if you generate for 50 steps, and have refiner switchover at 0.8 (the recommended value for refiners trained like the SDXL refiner), the main model will generate for 40 steps, and the refiner will load after that and complete the last 10. This works fine where you are using txt2img with the default sampling schedule (i.e. not Karras or Exponential).
However, this is not aligned with how the refiner is trained -- the refiner is trained on the last 200 timesteps which is not always aligned with what happens at 80% through the sampling process. There are a few situations where this setup will result in the refiner being called too early or too late:
Using different sampling schedules, especially with Zero Terminal SNR rescaling, will cause the refiner to be called too early. For a 50 step Karras schedule, refiner switchover would need to happen at 0.88 to not call it too early.
Using inpainting/img2img will in almost every case cause the refiner to be called too late. The correct switchover point will change whenever you change the denoising strength, and can be very tedious to manage.
The more reliable way to handle this would be to configure refiners for the highest timestep they were trained on and switch when we are about to process a timestep that falls below that, which is usually the last 200 timesteps. With this, the refiner model would never need to be tweaked for any change in configuration because it will only be called for timesteps it was trained on.
There is a potential corner case (which I have not tested in webui) with second-order samplers that Diffusers caught and fixed, described here. Fixes that should ensure this isn't a problem would be either a) deciding to switch to the refiner only when both timesteps called during the sampler step are below 200, or b) implementing the refiner as a model wrapper so that it is impossible to call the refiner model on timesteps that are out of range. Second solution would be a more faithful implementation for how ensemble of expert models should work (i.e. this would give correct results if you had a refiner model trained on timesteps below 200 and a main model trained on timesteps at or above 200, where other solutions might not work).
Proposed workflow
Click "Refiner" checkbox and expand the box.
"Switch at" slider is either changed purely under the hood or is changed to a scale of timesteps from 1000 to 1. Tooltip changed to something to the effect of: "fraction of model's trained timesteps when the switch to the refiner model should happen; for most dedicated refiner models this should be set to 0.8 and left alone."
When generating, no matter what I do past that point, the refiner should never be called with a timestep greater than or equal to 200.
Additional information
No response
The text was updated successfully, but these errors were encountered:
Is there an existing issue for this?
What would your feature do ?
Currently, refiner switchover is controlled by a fraction of the generation process. So, if you generate for 50 steps, and have refiner switchover at 0.8 (the recommended value for refiners trained like the SDXL refiner), the main model will generate for 40 steps, and the refiner will load after that and complete the last 10. This works fine where you are using txt2img with the default sampling schedule (i.e. not Karras or Exponential).
However, this is not aligned with how the refiner is trained -- the refiner is trained on the last 200 timesteps which is not always aligned with what happens at 80% through the sampling process. There are a few situations where this setup will result in the refiner being called too early or too late:
The more reliable way to handle this would be to configure refiners for the highest timestep they were trained on and switch when we are about to process a timestep that falls below that, which is usually the last 200 timesteps. With this, the refiner model would never need to be tweaked for any change in configuration because it will only be called for timesteps it was trained on.
There is a potential corner case (which I have not tested in webui) with second-order samplers that Diffusers caught and fixed, described here. Fixes that should ensure this isn't a problem would be either a) deciding to switch to the refiner only when both timesteps called during the sampler step are below 200, or b) implementing the refiner as a model wrapper so that it is impossible to call the refiner model on timesteps that are out of range. Second solution would be a more faithful implementation for how ensemble of expert models should work (i.e. this would give correct results if you had a refiner model trained on timesteps below 200 and a main model trained on timesteps at or above 200, where other solutions might not work).
Proposed workflow
Additional information
No response
The text was updated successfully, but these errors were encountered: