Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

full+reward微调训练,添加--save_safetensors False后会删除pytorch_model.bin #5305

Closed
1 task done
aistream69 opened this issue Aug 29, 2024 · 1 comment
Closed
1 task done
Labels
solved This problem has been already solved

Comments

@aistream69
Copy link

Reminder

  • I have read the README and searched the existing issues.

System Info

full+reward模式,Qwen1.5-0.5B-Chat微调训练时,如果不添加--save_safetensors会报错:
RuntimeError: Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'model.embed_tokens.weight', 'lm_head.weight'}].
添加--save_safetensors后虽然不再报错,但保存模型时src/llamafactory/train/callbacks.py的函数fix_valuehead_checkpoint内os.remove(path_to_checkpoint)会删除pytorch_model.bin,导致保存的模型无法使用,请问该如何解决?谢谢.

Reproduction

llamafactory-cli train --stage rm --do_train True --model_name_or_path models/Qwen1.5-0.5B-Chat --preprocessing_num_workers 16 --finetuning_type full --quantization_method bitsandbytes --template qwen --flash_attn auto --dataset_dir data --dataset dpo_en_demo --cutoff_len 256 --learning_rate 0.0002 --num_train_epochs 3.0 --max_samples 500 --per_device_train_batch_size 2 --gradient_accumulation_steps 4 --lr_scheduler_type cosine --max_grad_norm 1.0 --logging_steps 5 --save_steps 100 --warmup_steps 0 --optim adamw_torch --packing False --report_to none --output_dir saves/Qwen1.5-0.5B-Chat/full_rm --bf16 True --plot_loss True --ddp_timeout 180000000 --include_num_input_tokens_seen True --save_safetensors False

Expected behavior

No response

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Aug 29, 2024
@hiyouga hiyouga added bug Something isn't working solved This problem has been already solved and removed bug Something isn't working pending This problem is yet to be addressed labels Aug 29, 2024
@hiyouga
Copy link
Owner

hiyouga commented Aug 29, 2024

fixed

yuwangnexusera pushed a commit to yuwangnexusera/LLaMA-Factory that referenced this issue Sep 5, 2024
yuwangnexusera pushed a commit to yuwangnexusera/LLaMA-Factory that referenced this issue Sep 5, 2024
yuwangnexusera pushed a commit to yuwangnexusera/LLaMA-Factory that referenced this issue Sep 5, 2024
yuwangnexusera pushed a commit to yuwangnexusera/LLaMA-Factory that referenced this issue Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

2 participants