Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Multimodal LLM Finetuning #3450

Merged
merged 38 commits into from
Apr 25, 2024
Merged

Add Multimodal LLM Finetuning #3450

merged 38 commits into from
Apr 25, 2024

Conversation

BUAADreamer
Copy link
Collaborator

@BUAADreamer BUAADreamer commented Apr 25, 2024

What does this PR do?

Add finetuning Multimodal-LLM especially for LLaVA by leveraging AutoModelForVision2Seq and AutoProcessortransformers

This PR is working in progress, needs improvement in the future, e.g. other MLLM.

For more usage, you can refer to MLLM-Finetuning-Demo

Support Models

  • LLaVA-1.5

Make your own Instruct Dataset

Just organize the content like the data/mllm_demo.json.

Finetuning

See examples at examples/lora_single_gpu/llava1_5_lora_sft.yaml

Before submitting

@hiyouga hiyouga self-requested a review April 25, 2024 18:50
@hiyouga
Copy link
Owner

hiyouga commented Apr 25, 2024

LGTM! Thanks for your contribs!

@hiyouga hiyouga merged commit c20f750 into hiyouga:mllm Apr 25, 2024
@hiyouga hiyouga added the solved This problem has been already solved label Apr 25, 2024
@BUAADreamer BUAADreamer deleted the mllm branch May 23, 2024 05:40
@whyiug
Copy link

whyiug commented May 26, 2024

Hi @BUAADreamer, thanks for your work. Can you explain how this HF version differs from the original(i mean https://github.com/haotian-liu/LLaVA) during training, and does the HF version trains the mm_projector layers?
thanks a lot.

@BUAADreamer
Copy link
Collaborator Author

BUAADreamer commented May 26, 2024

This HFversion is nearly the same as the origin, it is transferred by the official researchers of Huggingface and Haotian Liu.
Our current sft of MLLM is the same as the ft stage in LLaVA paper, only fine-tune the mm_proj and the LM.
You could refer to this Zhihu blog to learn more about fine-tuning MLLM
And you could refer to this fine-tuned paligemma by @hiyouga for a successful example

@BUAADreamer
Copy link
Collaborator Author

BUAADreamer commented May 26, 2024

Besides, if you want to pre-train like LLaVA, you can refer to #3835 to only fine-tune the mm_proj

@whyiug
Copy link

whyiug commented May 26, 2024

Thx, those really help me a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants