Skip to content
This repository has been archived by the owner on Oct 22, 2023. It is now read-only.

Incredibly Frustrating Bug - Training model collapses due to tkinter #93

Open
Claxiz opened this issue Jan 31, 2023 · 1 comment
Open

Comments

@Claxiz
Copy link

Claxiz commented Jan 31, 2023

With some certain settings, not sure what contributes to it, this error prints out:

Weights saved to C:/AI/StableTuner/models/1osvgA\epoch_80
Steps To Epoch: 33%|██████████████████████▎ | 4/12 [00:08<00:16, 2.05s/it]Using [00:22<00:00, 1.98s/it]Using FlashAttention|█████████████████████████████████████▊ | 1032/1200 [47:35<05:32, 1.98s/it, loss=nan, lr=5e-6]
Overall Epochs: 86%|███████████████████████████████████████████████████████▉ | 86/100 [47:35<06:58, 29.89s/it]C:\ProgramData\anaconda3\envs\ST\lib\site-packages\diffusers\pipeline_utils.py:788: :\ProgramData\anaconda3\envs\ST\lib\site-packages\diffusers\pipeline_utils.py:788: RuntimeWarning: invalid value encountered in cast images = (images * 255).round().astype("uint8")

Training proceeds to continue with loss going from normal loss ranges to loss=nan, until training finishes, when this error appears:

bgerror failed to handle background error.
Original error: invalid command name "1414340073536update"
Error in bgerror: can't invoke "tk" command: application has been destroyed
bgerror failed to handle background error.
Original error: invalid command name "1414484413504_click_animation"
Error in bgerror: can't invoke "tk" command: application has been destroyed
bgerror failed to handle background error.
Original error: invalid command name "1414523277696check_dpi_scaling"
Error in bgerror: can't invoke "tk" command: application has been destroyed
warning: redirecting to https://github.com/devilismyfriend/StableTuner.git/
Latest git hash: ef51982

This is everything for the traceback. Training session was started using fp32, alongside these settings-
Capture

The ultimate effect of this error causes the model being trained to collapse, breaking everything after the failed epochs. Once this happens, trying to use one of these models in something like the webui causes generations to fail on startup, with errors requesting "Upcast cross attention layer to float32" to be turned on in settings, and a commandline args change. If the model is loaded with appropriate settings in the webui as the error requests, generations only result in black images.

@devilismyfriend
Copy link
Owner

Looks like one of your images might be corrupted, tkinter has nothing to do with this, it's unloaded during training and reloaded after it finishes or fails

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants