Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

singlenode with 2gpus deepspeed zeor2/3 can't log step #3559

Closed
1 task done
xxll88 opened this issue May 3, 2024 · 1 comment
Closed
1 task done

singlenode with 2gpus deepspeed zeor2/3 can't log step #3559

xxll88 opened this issue May 3, 2024 · 1 comment
Labels
solved This problem has been already solved

Comments

@xxll88
Copy link

xxll88 commented May 3, 2024

Reminder

  • I have read the README and searched the existing issues.

Reproduction

'--logging_steps 20 ' 时:
File "/home/ubuntu/LLaMA-Factory/src/llmtuner/extras/callbacks.py", line 137, in on_log
current_steps=self.cur_steps,
AttributeError: 'LogCallback' object has no attribute 'cur_steps'

DS_SKIP_CUDA_CHECK=1 deepspeed --num_gpus 2 src/train.py
--deepspeed ds_config.json
--stage sft
--do_train
--model_name_or_path ../Meta-Llama-3-8B-Instruct
--dataset Law-Pair,Law-Triplet
--dataset_dir data
--overwrite_cache False
--template llama3
--finetuning_type full
--output_dir /home/ubuntu/sft-law/llama_factory_law_llama3_8B_full
--overwrite_output_dir
--cutoff_len 1024
--preprocessing_num_workers 16
--per_device_train_batch_size 8
--per_device_eval_batch_size 1
--gradient_accumulation_steps 1
--lr_scheduler_type cosine
--logging_steps 10
--warmup_steps 20
--save_steps 200
--save_total_limit 5
--eval_steps 200
--evaluation_strategy steps
--load_best_model_at_end
--learning_rate 5e-5
--num_train_epochs 3.0
--val_size 0.0001
--plot_loss
--bf16

Expected behavior

No response

System Info

No response

Others

No response

@xxll88 xxll88 changed the title singlenode with 2gpus deepspeed zeor2 can't log step singlenode with 2gpus deepspeed zeor2/3 can't log step May 3, 2024
@hiyouga
Copy link
Owner

hiyouga commented May 3, 2024

fixed

hiyouga added a commit that referenced this issue May 3, 2024
@hiyouga hiyouga added the solved This problem has been already solved label May 3, 2024
@hiyouga hiyouga closed this as completed May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

2 participants