-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
examples/train_lora/llama3_lora_sft_ds3.yaml 报错 #5252
Comments
但是用llamafactory-cli train examples/train_lora/llama3_lora_sft_ds0.yaml的时候就不会报错 |
我也遇到此类问题 |
同样的错误,将 deepspeed 退回到 0.14.0 版本对我有用。 |
将 deepspeed 退回到 0.14.0 版本又报与pytroch版本不对应,无法运行,pytroch是根据cuda版本来的 |
所有依赖版本都采用仓库推荐版本呢? |
|
I encountered a similar issue, and it was resolved by using DeepSpeed version 0.14.4. I suspect that the problem arises in later versions of DeepSpeed due to type checking with Pydantic. Specifically, when the |
Thanks! this solution solve my issue! |
deepspeed==0.14.4 solved |
Reminder
System Info
用ds_z3_config.json的时候就会报错,错误显示:pydantic_core._pydantic_core.ValidationError: 1 validation error for DeepSpeedZeroConfig
[rank3]: stage3_prefetch_bucket_size
[rank3]: Input should be a valid integer, got a number with a fractional part [type=int_from_float, input_value=15099494.4, input_type=float]
请问这是deepspeed的版本问题吗
Reproduction
torch == 2.4.0
deepspeed == 0.15.0
llamafactory-cli train examples/train_lora/llama3_lora_sft_ds3.yaml
Expected behavior
No response
Others
No response
The text was updated successfully, but these errors were encountered: