单机多卡全参数训练LLAMA3,报错warmup_steps must be either 0 or > 1
#4005
Labels
solved
This problem has been already solved
warmup_steps must be either 0 or > 1
#4005
Reminder
Reproduction
我使用命令
./train.sh
发起对LLAMA3-70B的全参数训练,我使用的显卡是3张 A100-SXM4-40GB,以下是train.sh的内容。以下是llama3_sft_multi.yaml的内容,其中
model_name_or_path
一项我设置为了本地的模型。该模型是从Meta官网下载的LLAMA3-Instruct模型的pth文件经由transformers脚本转换后得到的:以下是
deepspeed_z3_config.json
的内容:运行
./train.sh
后报以下错误:Expected behavior
使用三张显卡进行LLAMA3-70B的全参量训练
System Info
transformers
version: 4.42.0.dev0Others
No response
The text was updated successfully, but these errors were encountered: