Sudden Increase in Loss and Corrupted Output Images During Zero123 Model Training #16

sunkymepro · 2025-01-28T08:03:45Z

I am training a Zero123-hf model to generate images of a face from an input image and a corresponding pose vector. However, during training, I observe a sudden increase in loss, and the output images become corrupted. Below are the details of my command:

python train_zero1to3.py --train_data_dir /data/root/path/ --pretrained_model_name_or_path ./sd-image-variations-diffusers --train_batch_size 2 --dataloader_num_workers 16 --output_dir logs --gradient_checkpointing --mixed_precision no --resume_from_checkpoint latest

the loss image is shown as below:

I also observed that when setting the learning rate to 1e-4, the pose of the generated image is correct, but the image quality is relatively poor. The results are shown below (after 6,500 iterations). From left to right, the images are: the input image, the two generated results, and the ground truth.

As the number of iterations increases, the pose of the generated image becomes incorrect, and eventually, the output degrades into a completely black image. The results are shown below.

(15000 iterations)

(19000 iterations)

I would like to know where my training is going wrong, thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sudden Increase in Loss and Corrupted Output Images During Zero123 Model Training #16

Sudden Increase in Loss and Corrupted Output Images During Zero123 Model Training #16

sunkymepro commented Jan 28, 2025

Sudden Increase in Loss and Corrupted Output Images During Zero123 Model Training #16

Sudden Increase in Loss and Corrupted Output Images During Zero123 Model Training #16

Comments

sunkymepro commented Jan 28, 2025