-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
💡 [REQUEST] - 支持Audio的微调方案 #745
Comments
😄 Hey! I think our framework, align-anything, has implemented this functionality. We have fine-tuned it on our open-source align-anything/text-audio-to-text dataset and provided a directly runnable script. Everyone is welcome to use it! |
你好,很高兴你有微调的兴趣,audio到text的微调方案几乎和image到text的相差不大,修改成本比较小. 我们会在下周给出示例代码. |
LLaMA-Factory has supported audio-text to text fine-tuning and inference, you can also try it 🤗 |
我看到模型架构的audio encoder似乎与qwen是分离的,如果我的数据是有输入audio对应文本的,我是不是也可以直接去做text2text的sft |
您好,这种方式可能会导致无法完成 音频输入情况下的对齐, 您可以尝试使用https://github.com/hiyouga/LLaMA-Factory/pull/6701来进行微调,已经支持 audio 2 text啦 |
您好,请问一下,多个音频比如一个是用于声音克隆的音频,一个是需要改变声音的音频,这种场景的微调数据json大概是什么样的? |
hello,我正在用你们添加了support的那个branch进行微调,有一个问题,我可不可以使用我的custom system prompt来作为微调的system prompt?比如我希望我微调之后,模型的行为是我输入什么都会翻译成英文,我希望把system的prompt修改以跟正常qa问答的system prompt区分开,可以做到吗? |
我这边试图在跑LLama-Factory的代码,发现跑不通呢,会报错,传入的processor是一个None。然后LLama-Factory的main branch也把你们的pr pending了 |
请问你们支持lora sft吗?目前我在sft.py的源码里似乎没有看到lora的option |
还没有测试过,最近会支持上,您可以先试试全参~ |
当然支持,您可以直接在这里 https://github.com/BUAADreamer/LLaMA-Factory/blob/13d252fa7856ecb14ba6907e5adb10070e5cdde4/src/llamafactory/data/template.py#L958 新增你的模板,加上以下几行: _register_template(
name="minicpm_o_audio",
format_user=StringFormatter(slots=["<|im_start|>user\n{{content}}<|im_end|>\n<|im_start|>assistant\n"]),
format_assistant=StringFormatter(slots=["{{content}}<|im_end|>\n"]),
format_system=StringFormatter(slots=["<|im_start|>system\n{{content}}<|im_end|>\n"]),
stop_words=["<|im_end|>"],
default_system=(
"不管输入什么都需要直接翻译为英文"
),
mm_plugin=get_mm_plugin(name="minicpm_v", image_token="<image>", video_token="<video>"),
) 并在yaml文件中使用 |
不知道有没有tut可以让我使用自己的数据集,我现在只能查代码看看怎么把自己的数据用来做training |
暂时推荐使用transformers==4.45.0,可以稳定跑通微调和推理 【重要】使用以下方式安装最新的llamafactory以及相应的库 git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics,deepspeed,minicpm_v]"
pip3 install transformers==4.45.0
pip3 install huggingface_hub==0.25.0 |
其实readme和文档主页就有示例,您看看能不能满足您的需求? |
起始日期 | Start Date
No response
实现PR | Implementation PR
之后会更新MiniCPM-O的audio到text的微调方案吗?目前我自己只能根据model_server里的处理流程,试着把audio处理成推理的样子
相关Issues | Reference Issues
摘要 | Summary
基本示例 | Basic Example
缺陷 | Drawbacks
未解决问题 | Unresolved questions
The text was updated successfully, but these errors were encountered: