train_dreambooth_lora_flux validation RuntimeError: Input type (float) and bias type (c10::BFloat16) should be the same #9476

squewel · 2024-09-19T23:57:40Z

Describe the bug

When train_dreambooth_lora_flux attempts to generate images during validation, RuntimeError: Input type (float) and bias type (c10::BFloat16) should be the same is thrown

Reproduction

Just follow the steps from README_flux.md for DreamBooth LoRA with text-encoder training:

export OUTPUT_DIR="trained-flux-dev-dreambooth-lora"

accelerate launch train_dreambooth_lora_flux.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --output_dir=$OUTPUT_DIR \
  --mixed_precision="bf16" \
  --train_text_encoder\
  --instance_prompt="a photo of sks dog" \
  --resolution=512 \
  --train_batch_size=1 \
  --guidance_scale=1 \
  --gradient_accumulation_steps=4 \
  --optimizer="prodigy" \
  --learning_rate=1. \
  --report_to="wandb" \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=500 \
  --validation_prompt="A photo of sks dog in a bucket" \
  --seed="0" \
  --push_to_hub```

### Logs

```shell
09/19/2024 23:08:58 - INFO - __main__ - Running validation... ███████████████████████████████████████████████████| 7/7 [00:00<00:00, 13.76it/s]
 Generating 4 images with prompt: a photo of sks dog
W0919 23:12:39.471000 139969377689600 torch/fx/experimental/symbolic_shapes.py:4449] [0/3] xindex is not in var_ranges, defaulting to unknown range.
W0919 23:17:03.532000 139969377689600 torch/fx/experimental/symbolic_shapes.py:4449] [0/4] xindex is not in var_ranges, defaulting to unknown range.
Traceback (most recent call last):
  File "/workspace/flux-diffusers/diffusers/examples/dreambooth/train_dreambooth_lora_flux.py", line 1890, in <module>
    main(args)
  File "/workspace/flux-diffusers/diffusers/examples/dreambooth/train_dreambooth_lora_flux.py", line 1810, in main
    images = log_validation(
  File "/workspace/flux-diffusers/diffusers/examples/dreambooth/train_dreambooth_lora_flux.py", line 189, in log_validation
    images = [pipeline(**pipeline_args, generator=generator).images[0] for _ in range(args.num_validation_images)]
  File "/workspace/flux-diffusers/diffusers/examples/dreambooth/train_dreambooth_lora_flux.py", line 189, in <listcomp>
    images = [pipeline(**pipeline_args, generator=generator).images[0] for _ in range(args.num_validation_images)]
  File "/workspace/flux-diffusers/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/flux-diffusers/diffusers/src/diffusers/pipelines/flux/pipeline_flux.py", line 762, in __call__
    image = self.vae.decode(latents, return_dict=False)[0]
  File "/workspace/flux-diffusers/diffusers/src/diffusers/utils/accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
  File "/workspace/flux-diffusers/diffusers/src/diffusers/models/autoencoders/autoencoder_kl.py", line 321, in decode
    decoded = self._decode(z).sample
  File "/workspace/flux-diffusers/diffusers/src/diffusers/models/autoencoders/autoencoder_kl.py", line 292, in _decode
    dec = self.decoder(z)
  File "/workspace/flux-diffusers/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/flux-diffusers/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/flux-diffusers/diffusers/src/diffusers/models/autoencoders/vae.py", line 291, in forward
    sample = self.conv_in(sample)
  File "/workspace/flux-diffusers/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/flux-diffusers/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/flux-diffusers/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 458, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/workspace/flux-diffusers/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 454, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (float) and bias type (c10::BFloat16) should be the same

System Info

Diffusers:

- Platform: Linux-5.4.0-167-generic-x86_64-with-glibc2.35
- Running on Google Colab?: No
- Python version: 3.10.12
- PyTorch version (GPU?): 2.4.1+cu121 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.25.0
- Transformers version: 4.44.2
- Accelerate version: 0.34.2
- PEFT version: 0.12.0
- Bitsandbytes version: not installed
- Safetensors version: 0.4.5
- xFormers version: not installed
- Accelerator: NVIDIA L40, 46068 MiB

Accelerate config:

compute_environment: LOCAL_MACHINE                                                                                          
debug: false                                                                                                                
distributed_type: 'NO'                                                                                                      
downcast_bf16: 'no'
dynamo_config:
  dynamo_backend: INDUCTOR
enable_cpu_affinity: false
gpu_ids: all
machine_rank: 0
main_training_function: main
mixed_precision: 'no'
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

Who can help?

@sayakpaul @linoytsaban

The text was updated successfully, but these errors were encountered:

icsl-Jeon · 2024-09-21T12:50:44Z

Even the below leads to OOM

diffusers/examples/dreambooth/train_dreambooth_lora_flux.py

Line 185 in aa73072

    
           # autocast_ctx = torch.autocast(accelerator.device.type) if not is_final_validation else nullcontext()

kishlaykumar1995 · 2024-09-24T10:43:37Z

Had the same issue. As a temporary fix, I added the code to convert the latent to bfloat16 on line 762 of pipeline_flux.py and it worked. But this was just temporary, and I don't know if it was the correct thing to do or not

xngli · 2024-09-26T21:58:16Z

Even the below leads to OOM

diffusers/examples/dreambooth/train_dreambooth_lora_flux.py

Line 185 in aa73072

# autocast_ctx = torch.autocast(accelerator.device.type) if not is_final_validation else nullcontext()

Uncomment this line and comment the one below; that resolved the issue for me

linoytsaban · 2024-09-30T10:09:11Z

@sayakpaul do you recall why we have this line commented in log_validation? might also be an issue with other scripts and or related to #9419

# autocast_ctx = torch.autocast(accelerator.device.type) if not is_final_validation else nullcontext()
autocast_ctx = nullcontext()

linoytsaban · 2024-09-30T10:12:08Z

seems the same issue as in #9548, #9549

sayakpaul · 2024-10-02T13:42:04Z

@linoytsaban #9549 (comment)

github-actions · 2024-10-26T15:02:59Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

luchaoqi · 2024-12-16T06:16:18Z

Not sure if this problem is totally resolved, I was trying to use

    autocast_ctx = torch.autocast(accelerator.device.type) if not is_final_validation else nullcontext()
    # autocast_ctx = nullcontext()

but got error and black images similar to #9549

12/16/2024 01:05:28 - INFO - __main__ - Running validation...
 Generating 4 images with prompt: a photo of sks person at 50 years old.
/playpen-nas-ssd/luchao/software/miniconda3/envs/diffuser/lib/python3.10/site-packages/diffusers/image_processor.py:147: RuntimeWarning: invalid value encountered in cast
  images = (images * 255).round().astype("uint8")

Then I tried the PR fix in #9565 but still have the error

RuntimeError: Input type (float) and bias type (c10::BFloat16) should be the same

The code I run:

 accelerate launch  train_dreambooth_lora_flux.py \
  --pretrained_model_name_or_path="black-forest-labs/FLUX.1-dev"  \
  --instance_data_dir="xxx" \
  --output_dir="xxx" \
  --mixed_precision="bf16" \
  --instance_prompt="a photo of sks person" \
  --resolution=512 \
  --train_batch_size=1 \
  --guidance_scale=1 \
  --gradient_accumulation_steps=4 \
  --optimizer="prodigy" \
  --learning_rate=1. \
  --report_to="wandb" \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=500 \
  --validation_prompt="a photo of sks person at 50 years old" \
  --validation_epochs=25 \
  --seed="0" \
  --lora_layers="attn.to_k,attn.to_q,attn.to_v,attn.to_out.0"

github-actions · 2025-01-09T15:05:05Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

squewel added the bug Something isn't working label Sep 19, 2024

squewel changed the title ~~train_dreambooth_lora_flux validation RuntimeError~~ train_dreambooth_lora_flux validation RuntimeError: Input type (float) and bias type (c10::BFloat16) should be the same Sep 19, 2024

linoytsaban mentioned this issue Sep 30, 2024

Fixed the issue on flux dreambooth lora training #9549

Closed

6 tasks

icsl-Jeon mentioned this issue Oct 1, 2024

Handling mixed precision for dreambooth flux lora training #9565

Merged

6 tasks

github-actions bot added the stale Issues that haven't received updates label Oct 26, 2024

github-actions bot removed the stale Issues that haven't received updates label Dec 16, 2024

github-actions bot added the stale Issues that haven't received updates label Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train_dreambooth_lora_flux validation RuntimeError: Input type (float) and bias type (c10::BFloat16) should be the same #9476

train_dreambooth_lora_flux validation RuntimeError: Input type (float) and bias type (c10::BFloat16) should be the same #9476

squewel commented Sep 19, 2024

icsl-Jeon commented Sep 21, 2024

kishlaykumar1995 commented Sep 24, 2024

xngli commented Sep 26, 2024

linoytsaban commented Sep 30, 2024

linoytsaban commented Sep 30, 2024

sayakpaul commented Oct 2, 2024

github-actions bot commented Oct 26, 2024

luchaoqi commented Dec 16, 2024

github-actions bot commented Jan 9, 2025

train_dreambooth_lora_flux validation RuntimeError: Input type (float) and bias type (c10::BFloat16) should be the same #9476

train_dreambooth_lora_flux validation RuntimeError: Input type (float) and bias type (c10::BFloat16) should be the same #9476

Comments

squewel commented Sep 19, 2024

Describe the bug

Reproduction

System Info

Who can help?

icsl-Jeon commented Sep 21, 2024

kishlaykumar1995 commented Sep 24, 2024

xngli commented Sep 26, 2024

linoytsaban commented Sep 30, 2024

linoytsaban commented Sep 30, 2024

sayakpaul commented Oct 2, 2024

github-actions bot commented Oct 26, 2024

luchaoqi commented Dec 16, 2024

github-actions bot commented Jan 9, 2025