Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sudden Increase in Loss and Corrupted Output Images During Zero123 Model Training #16

Open
sunkymepro opened this issue Jan 28, 2025 · 0 comments

Comments

@sunkymepro
Copy link

I am training a Zero123-hf model to generate images of a face from an input image and a corresponding pose vector. However, during training, I observe a sudden increase in loss, and the output images become corrupted. Below are the details of my command:

python train_zero1to3.py --train_data_dir /data/root/path/ --pretrained_model_name_or_path ./sd-image-variations-diffusers --train_batch_size 2 --dataloader_num_workers 16 --output_dir logs --gradient_checkpointing --mixed_precision no --resume_from_checkpoint latest

the loss image is shown as below:

Image

I also observed that when setting the learning rate to 1e-4, the pose of the generated image is correct, but the image quality is relatively poor. The results are shown below (after 6,500 iterations). From left to right, the images are: the input image, the two generated results, and the ground truth.

Image

As the number of iterations increases, the pose of the generated image becomes incorrect, and eventually, the output degrades into a completely black image. The results are shown below.

Image

(15000 iterations)

Image

(19000 iterations)

I would like to know where my training is going wrong, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant