Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

微调 qwen2的时候为啥默认开启的是多卡,我明明用的是单卡训练,而且我也用webui试了单卡,但是它默认的还是多卡 #4137

Closed
1 task done
yxl23 opened this issue Jun 7, 2024 · 7 comments
Labels
solved This problem has been already solved

Comments

@yxl23
Copy link

yxl23 commented Jun 7, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

image

Reproduction

llamafactory-cli train examples/lora_single_gpu/llama3_lora_sft.yaml

Expected behavior

No response

Others

No response

@frozenarctic
Copy link

frozenarctic commented Jun 7, 2024

在cli.py的第80行,改成
if (not disable_torchrun) and (get_device_count() > 1):
然后到根目录下重新运行
pip install -e '.[torch,metrics]'
默认就是用本地单卡了

@hiyouga hiyouga added the solved This problem has been already solved label Jun 7, 2024
@hiyouga hiyouga closed this as completed in 8bf9da6 Jun 7, 2024
@hiyouga
Copy link
Owner

hiyouga commented Jun 7, 2024

修复了

@yxl23
Copy link
Author

yxl23 commented Jun 8, 2024

我用llamafactory-cli train examples/qlora_single_gpu/llama3_lora_sft_bitsandbytes.yaml 8bit量化微调qwen2出现
{'loss': 0.0, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.51}
{'loss': 0.0, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.02}
{'loss': 0.0, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.53}

@hiyouga
Copy link
Owner

hiyouga commented Jun 8, 2024

用 bf16

@yxl23
Copy link
Author

yxl23 commented Jun 8, 2024

在哪添加啊

model

model_name_or_path: E:\LLaMA-Factory\qwen\Qwen2-7B-Instruct
quantization_bit: 8

method

stage: sft
do_train: true
finetuning_type: lora
lora_target: all

dataset

dataset: xunlian
template: qwen
cutoff_len: 1024
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16

output

output_dir: saves/qwen2-7b/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

train

per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 100.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
fp16: true

eval

val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500

@hiyouga
Copy link
Owner

hiyouga commented Jun 8, 2024

fp16 换成 bf16

@yxl23
Copy link
Author

yxl23 commented Jun 8, 2024

好的谢谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

3 participants