Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]后续会支持llava等多模态模型的训练吗 #2093

Closed
1 task done
Vincent131499 opened this issue Jan 4, 2024 · 2 comments · Fixed by #3454
Closed
1 task done

[Feature]后续会支持llava等多模态模型的训练吗 #2093

Vincent131499 opened this issue Jan 4, 2024 · 2 comments · Fixed by #3454
Labels
solved This problem has been already solved

Comments

@Vincent131499
Copy link

Reminder

  • I have read the README and searched the existing issues.

Reproduction

如题

Expected behavior

No response

System Info

No response

Others

No response

@Vincent131499 Vincent131499 changed the title [Feature]后续会支持llama等多模态模型的训练吗 [Feature]后续会支持llava等多模态模型的训练吗 Jan 4, 2024
@hiyouga hiyouga added the pending This problem is yet to be addressed label Jan 4, 2024
@Katehuuh
Copy link
Contributor

They are more state-of-the-art multimodal and diverse like video, sound, 3D, all-in-one… but I’m interested in using the LLaVA-1.5-LoRA because of oobabooga UI compatibility 4bit.

@hiyouga hiyouga added the enhancement New feature or request label Feb 6, 2024
@hiyouga hiyouga added solved This problem has been already solved and removed enhancement New feature or request pending This problem is yet to be addressed labels Apr 25, 2024
@Katehuuh
Copy link
Contributor

Katehuuh commented Apr 26, 2024

Are multiple images supported in same conversation? as we currently have

},
{
"messages": [
{
"content": "Please describe this image<image>",
"role": "user"
},
{

As referred to
}
],
"images": [
"images/3.jpg"
]
}
]

For example, if we have in the same conversation "content": "Is this the same person?<image>[2]"", and set it:

   "images": [ 
     "images/3.jpg",
     "images/4.jpg" 
   ] 

Edit1: it seems like no:

def _preprocess_visual_inputs(images: Sequence["ImageObject"], processor: "ProcessorMixin") -> "NDArray":
# process visual inputs (currently only supports a single image)
image_processor: "BaseImageProcessor" = getattr(processor, "image_processor")
image = images[0] if len(images) != 0 else Image.new("RGB", (100, 100), (255, 255, 255))
return image_processor(image, return_tensors="pt")["pixel_values"][0]


hiyouga: llava + qlora, it now requires ~5GB to fine-tune llava1.5 -7b

It seems like it cannot be train of the base Llama-2/3 however, we have liuhaotian/llava-v1.5-13b-lora that can be applied to Llama-2.

Ignore this line as it is derived but I've seen llava-Phi-3/Llama-3 from here: InternLM/xtuner.
No LoRa for llava-v1.6 but unsure if v1.6 is supported?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants