提一个小建议。不知道对不对，在解决导出模型时：”RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'“ #3333

ganchun1130 · 2024-04-18T12:46:11Z

Reminder

I have read the README and searched the existing issues.

Reproduction

这个报错是因为cpu不支持bf32或者是fp16，原因在于没使用gpu，
而原文在src/llmtuner/hparams/model_args.py 大约132行代码是：
export_device: str = field(
default="cpu",
metadata={"help": "The device used in model export."},
)
默认是cpu，所以会报错，改成cuda，就可以顺利导出模型！

Expected behavior

No response

System Info

No response

Others

No response

hiyouga · 2024-04-18T13:04:36Z

可以提供下训练的命令吗？是否用了除了 lora 外别的微调方法

ganchun1130 · 2024-04-19T01:17:32Z

这是我保存的训练命令
{
"top.model_name": "Qwen-7B-Chat",
"top.finetuning_type": "lora",
"top.adapter_path": [],
"top.quantization_bit": "none",
"top.template": "qwen",
"top.rope_scaling": "none",
"top.booster": "none",
"train.training_stage": "Supervised Fine-Tuning",
"train.dataset_dir": "/usr/local/TFBOYS/gc/NLP/LLaMA-Factory/data",
"train.dataset": [
"ai_augmented_classification_data_zh_1"
],
"train.learning_rate": "2e-5",
"train.num_train_epochs": "5.0",
"train.max_grad_norm": "1.0",
"train.max_samples": "100000",
"train.compute_type": "fp32",
"train.cutoff_len": 2048,
"train.batch_size": 4,
"train.gradient_accumulation_steps": 4,
"train.val_size": 0,
"train.lr_scheduler_type": "cosine",
"train.logging_steps": 5,
"train.save_steps": 100,
"train.warmup_steps": 0,
"train.neftune_alpha": 0,
"train.optim": "adamw_torch",
"train.resize_vocab": false,
"train.packing": false,
"train.upcast_layernorm": false,
"train.use_llama_pro": false,
"train.shift_attn": false,
"train.report_to": false,
"train.num_layer_trainable": 3,
"train.name_module_trainable": "all",
"train.lora_rank": 8,
"train.lora_alpha": 16,
"train.lora_dropout": 0.1,
"train.loraplus_lr_ratio": 0,
"train.create_new_adapter": false,
"train.use_rslora": false,
"train.use_dora": true,
"train.lora_target": "all",
"train.additional_target": "",
"train.dpo_beta": 0.1,
"train.dpo_ftx": 0,
"train.orpo_beta": 0.1,
"train.reward_model": null,
"train.use_galore": false,
"train.galore_rank": 16,
"train.galore_update_interval": 200,
"train.galore_scale": 0.25,
"train.galore_target": "all"
}

ganchun1130 · 2024-04-19T01:18:43Z

我也用过bf16的精度，也是会在导出模型时出现上述错误，后来改了代码之后就没有出现这个错误了

lvsijian8 · 2024-04-25T07:09:26Z

#3434
一样的问题，测试在 A100 没有问题，v100 有问题

hiyouga · 2024-04-25T11:02:44Z

在 webui 上加入了这个选项

hiyouga added the pending This problem is yet to be addressed label Apr 18, 2024

hiyouga added a commit that referenced this issue Apr 25, 2024

add export_device in webui #3333

3a7c128

hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Apr 25, 2024

hiyouga closed this as completed Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

提一个小建议。不知道对不对，在解决导出模型时：”RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'“ #3333

提一个小建议。不知道对不对，在解决导出模型时：”RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'“ #3333

ganchun1130 commented Apr 18, 2024

hiyouga commented Apr 18, 2024

ganchun1130 commented Apr 19, 2024

ganchun1130 commented Apr 19, 2024

lvsijian8 commented Apr 25, 2024

hiyouga commented Apr 25, 2024

提一个小建议。不知道对不对，在解决导出模型时：”RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'“ #3333

提一个小建议。不知道对不对，在解决导出模型时：”RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'“ #3333

Comments

ganchun1130 commented Apr 18, 2024

Reminder

Reproduction

Expected behavior

System Info

Others

hiyouga commented Apr 18, 2024

ganchun1130 commented Apr 19, 2024

ganchun1130 commented Apr 19, 2024

lvsijian8 commented Apr 25, 2024

hiyouga commented Apr 25, 2024