Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: Llama3.2 Vision Instruct prompt format #11508

Closed
1 task done
QuanHoangDanh opened this issue Dec 26, 2024 · 4 comments
Closed
1 task done

[Usage]: Llama3.2 Vision Instruct prompt format #11508

QuanHoangDanh opened this issue Dec 26, 2024 · 4 comments
Labels
usage How to use vllm

Comments

@QuanHoangDanh
Copy link

Your current environment

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.4.5.8
[pip3] nvidia-cuda-cupti-cu12==12.4.127
[pip3] nvidia-cuda-nvrtc-cu12==12.4.127
[pip3] nvidia-cuda-runtime-cu12==12.4.127
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.2.1.3
[pip3] nvidia-curand-cu12==10.3.5.147
[pip3] nvidia-cusolver-cu12==11.6.1.9
[pip3] nvidia-cusparse-cu12==12.3.1.170
[pip3] nvidia-ml-py==12.560.30
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.4.127
[pip3] nvidia-nvtx-cu12==12.4.127
[pip3] pyzmq==26.2.0
[pip3] torch==2.5.1
[pip3] torchvision==0.20.1
[pip3] transformers==4.46.2
[pip3] triton==3.1.0

How would you like to use vllm

I want to run inference of a Llama3.2 Vision Instruct model. In the Meta's prompt guide they said that

Its important to postion the <|image|> tag appropriately in the prompt. Image will only attend to the subsequent text tokens

that means in the prompt, I shoud have format like this {content": [{"type": "image"}, {"type": "text", "text": "..."}]} but in practically this prompt format gives better result {content": [{"type": "text", "text": "..."}, {"type": "image"}]}. Does anyone have an issue like this? If anyone knows why practical performance is different from theory, please tell me. Thanks in advance.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@QuanHoangDanh QuanHoangDanh added the usage How to use vllm label Dec 26, 2024
@DarkLight1337
Copy link
Member

cc @heheda12345

@heheda12345
Copy link
Collaborator

Does the current mllama chat template support your input format?

@QuanHoangDanh
Copy link
Author

QuanHoangDanh commented Dec 26, 2024

@heheda12345 current mllama supports my input format. But the mllama model is a late fusion model, so I think it is heavily affected by the order of the input format.

@ywang96
Copy link
Member

ywang96 commented Dec 31, 2024

The example usage has been updated in #11567 to correctly reflect the prompt guide from Meta, so closing this issue as its completed.

@ywang96 ywang96 closed this as completed Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage How to use vllm
Projects
None yet
Development

No branches or pull requests

4 participants