TypeError: init() got an unexpected keyword argument 'compute_dtype' #5334

GlennCGL · 2024-09-02T17:32:28Z

Reminder

I have read the README and searched the existing issues.

System Info

llamafactory version: 0.8.4.dev0

Reproduction

training config

    --deepspeed ${ds_config_path} \
    --stage dpo \
    --pref_beta 0.1 \
    --pref_loss sigmoid \
    --model_name_or_path ${model_name_or_path}  \
    --do_train \
    --dataset ${dataset} \
    --dataset_dir ${dataset_dir} \
    --preprocessing_num_workers 32 \
    --cutoff_len ${cutoff_len} \
    --template qwen \
    --finetuning_type ${finetuning_type} \
    --output_dir ${output_dir} \
    --overwrite_output_dir \
    --learning_rate 5e-06 \
    --overwrite_output_dir \
    --num_train_epochs 3 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --max_grad_norm 1.0 \
    --logging_steps 1 \
    --save_steps 1 \
    --save_strategy epoch \
    --warmup_ratio 0.1 \
    --weight_decay 0.01 \
    --bf16 True \
    --save_only_model \
    --plot_loss True \
    --gradient_checkpointing True  2>&1 | tee $log_file

ERROR MESSAGE:

09/03/2024 01:15:45 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
09/03/2024 01:15:45 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
09/03/2024 01:15:45 - INFO - llamafactory.model.adapter - ZeRO3 / FSDP detected, remaining trainable params in float32.
09/03/2024 01:15:45 - INFO - llamafactory.model.adapter - Fine-tuning method: Full
09/03/2024 01:15:45 - INFO - llamafactory.model.loader - trainable params: 7,615,616,512 || all params: 7,615,616,512 || trainable%: 100.0000
Traceback (most recent call last):
  File "src/train.py", line 28, in <module>
    main()
  File "src/train.py", line 19, in main
    run_exp()
  File "/ossfs/workspace/workspace/LLaMA-Factory-for-DPO/src/llamafactory/train/tuner.py", line 56, in run_exp
    run_dpo(model_args, data_args, training_args, finetuning_args, callbacks)
  File "/ossfs/workspace/workspace/LLaMA-Factory-for-DPO/src/llamafactory/train/dpo/workflow.py", line 58, in run_dpo
    ref_model = create_ref_model(model_args, finetuning_args)
  File "/ossfs/workspace/workspace/LLaMA-Factory-for-DPO/src/llamafactory/train/trainer_utils.py", line 121, in create_ref_model
    ref_model_args = ModelArguments.copyfrom(model_args)
  File "/ossfs/workspace/workspace/LLaMA-Factory-for-DPO/src/llamafactory/hparams/model_args.py", line 266, in copyfrom
    new_arg = cls(**arg_dict)
TypeError: __init__() got an unexpected keyword argument 'compute_dtype'
[2024-09-03 01:15:49,372] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 418107) of binary: /opt/conda/bin/python
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 810, in <module>
    main()
  File "/opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 806, in main
    run(args)
  File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 797, in run
    elastic_launch(
  File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

Expected behavior

No response

Others

No response

GlennCGL · 2024-09-02T17:57:59Z

bf16 + lora is ok.

however,

full parameter training has some bugs.
fp16 + lora has some nan problems. (Exception: Current loss scale already at minimum - cannot decrease scale anymore. Exiting run.)

yuanjing-jane · 2024-09-03T08:11:34Z

I had the same problem, but I rolled back the deepspeed=0.13.5 and fixed the problem.

When you encounter a torch.*. api（deepspeed/elasticity/elastic_agent.py） import error after rolling back the version, manually modify the source code as follows to resolve the error.

hiyouga · 2024-09-03T11:09:53Z

fixed

github-actions bot added the pending This problem is yet to be addressed label Sep 2, 2024

hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Sep 3, 2024

hiyouga closed this as completed in 59d2b31 Sep 3, 2024

yuwangnexusera pushed a commit to yuwangnexusera/LLaMA-Factory that referenced this issue Sep 5, 2024

fix hiyouga#5334

d64d42c

yuwangnexusera pushed a commit to yuwangnexusera/LLaMA-Factory that referenced this issue Sep 5, 2024

fix hiyouga#5334

dc39f49

yuwangnexusera pushed a commit to yuwangnexusera/LLaMA-Factory that referenced this issue Sep 5, 2024

fix hiyouga#5334

6ff9563

yuwangnexusera pushed a commit to yuwangnexusera/LLaMA-Factory that referenced this issue Sep 5, 2024

fix hiyouga#5334

0570a56

yuwangnexusera pushed a commit to yuwangnexusera/LLaMA-Factory that referenced this issue Sep 5, 2024

fix hiyouga#5334

44734cd

yuwangnexusera pushed a commit to yuwangnexusera/LLaMA-Factory that referenced this issue Sep 5, 2024

fix hiyouga#5334

78c3a49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: init() got an unexpected keyword argument 'compute_dtype' #5334

TypeError: init() got an unexpected keyword argument 'compute_dtype' #5334

GlennCGL commented Sep 2, 2024

GlennCGL commented Sep 2, 2024 •

edited

Loading

yuanjing-jane commented Sep 3, 2024

hiyouga commented Sep 3, 2024

TypeError: __init__() got an unexpected keyword argument 'compute_dtype' #5334

TypeError: __init__() got an unexpected keyword argument 'compute_dtype' #5334

Comments

GlennCGL commented Sep 2, 2024

Reminder

System Info

Reproduction

Expected behavior

Others

GlennCGL commented Sep 2, 2024 • edited Loading

yuanjing-jane commented Sep 3, 2024

hiyouga commented Sep 3, 2024

TypeError: init() got an unexpected keyword argument 'compute_dtype' #5334

TypeError: init() got an unexpected keyword argument 'compute_dtype' #5334

GlennCGL commented Sep 2, 2024 •

edited

Loading