* Equal contributions
- Apr-27-24- Google Colab is released to chat with Phi-3-V-3.8B model, check it out at Google Colab π₯π₯π₯
- Apr-26-24- Phi-3-V and LLaVA-3-V released: Excited to release the new integration of LLaVA with Phi-3 Mini Instruct and LLaMA-3 Instruct models! Hugging Face π₯π₯π₯
This repository enhances the capabilities of the LLaVA 1.5 model, incorporating latest LLMs released this weakπ₯, Phi-3 Mini Instruct 3.8B, and LLaMA-3 Instruct 8B.
Model | MMMU | POPE | MME | MMBench-en | MMBench-cn | SEED-all | SEED-img | SEED-vid | LLaVA-Wild | GQA | Science-QA | Average |
---|---|---|---|---|---|---|---|---|---|---|---|---|
LLaVA-v1.5-7B | 35.4 | 85.8 | 1510.7 | 64.3 | 58.3 | 58.6 | 66.1 | 37.3 | 65.4 | 62.0 | 66.8 | 60.0 |
LLaVA-v1.5-13B | 36.4 | 85.9 | 1531.3 | 67.7 | 63.6 | 61.6 | 68.2 | 42.7 | 72.5 | 63.3 | 71.6 | 63.3 |
LLaMA-3-V-8B | 37.1 | 84.2 | 1441.1 | 67.0 | 57.8 | 62.8 | 68.6 | 41.1 | 66.2 | 61.9 | 78.6 | 62.5 |
Phi-3-V-3.8B | 37.8 | 85.6 | 1470.1 | 68.2 | 58.5 | 62.8 | 67.7 | 44.5 | 70.9 | 61.7 | 80.7 | 63.8 |
- Average computed excluding MME, and second-best are underlined.
π LLaMA-3-V-8B full fine-tuning results - coming soon!
The following table provides an overview of the available models in our zoo. For each model, you can find links to its Hugging Face page.
Model Name | Hugging Face Link | Summary |
---|---|---|
LLaVA-Phi-3-mini-4k-instruct-pretrain | Hugging Face | Pretrained on LCS-558K. |
LLaVA-Phi-3-mini-4k-instruct-lora | Hugging Face | LoRA weights fine-tuned on LLaVA-Instruct-665K. |
LLaVA-Phi-3-mini-4k-instruct | Hugging Face | Merged weights in HuggingFace format. |
Model Name | Hugging Face Link | Summary |
---|---|---|
LLaVA-Meta-Llama-3-8B-Instruct-pretrain | Hugging Face | Pretrained on LCS-558K. |
LLaVA-Meta-Llama-3-8B-Instruct-lora | Hugging Face | LoRA weights fine-tuned on LLaVA-Instruct-665K. |
LLaVA-Meta-Llama-3-8B-Instruct | Hugging Face | Merged weights in HuggingFace format. |
git clone https://github.com/mbzuai-oryx/LLaVA-pp.git
cd LLaVA-pp
git submodule update --init --recursive
Packages you need to update from LLAVA:
pip install git+https://github.com/huggingface/transformers@a98c41798cf6ed99e1ff17e3792d6e06a2ff2ff3
To integrate Phi-3-V with LLaVA, follow these steps to update the codebase:
# Copy necessary files
cp Phi-3-V/train.py LLaVA/llava/train/train.py
cp Phi-3-V/llava_phi3.py LLaVA/llava/model/language_model/llava_phi3.py
cp Phi-3-V/builder.py LLaVA/llava/model/builder.py
cp Phi-3-V/model__init__.py LLaVA/llava/model/__init__.py
cp Phi-3-V/main__init__.py LLaVA/llava/__init__.py
cp Phi-3-V/conversation.py LLaVA/llava/conversation.py
# Training commands
cp scripts/Phi3-V_pretrain.sh LLaVA/Vi-phi3_pretrain.sh
cp scripts/Phi3-V_finetune_lora.sh LLaVA/Vi-phi3_finetune_lora.sh
- Pre-train
cd LLaVA
bash Phi3-V_pretrain.sh
- Finetune
cd LLaVA
bash Phi3-V_finetune_lora.sh
To integrate LLaMA-3-V with LLaVA, follow these steps to update the codebase:
# Copy necessary files
cp LLaMA-3-V/train.py LLaVA/llava/train/train.py
cp LLaMA-3-V/conversation.py LLaVA/llava/conversation.py
cp LLaMA-3-V/builder.py LLaVA/llava/model/builder.py
cp LLaMA-3-V/llava_llama.py LLaVA/llava/model/language_model/llava_llama.py
# Training commands
cp scripts/LLaMA3-V_pretrain.sh LLaVA/LLaMA3-V_pretrain.sh
cp scripts/LLaMA3-V_finetune_lora.sh LLaVA/LLaMA3-V_finetune_lora.sh
- Pre-train
cd LLaVA
bash LLaMA3-V_pretrain.sh
- Finetune
cd LLaVA
bash LLaMA3-V_finetune_lora.sh
We are thankful to LLaVA, and lmms-eval for releasing their models and code as open-source contributions.
In case if you face any issues or have any questions, please feel free to create an issue or reach out at [email protected] & [email protected].