-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: base64 string leads to gibberish with latest vLLM server and pixtral-12b #11781
Comments
Can you show the base64 string you sent? |
Sure please find the string attached. I was also able to decode it again for recreation of the image. Btw I updated the output of python collect_env.py |
When you use OpenAI API, |
I did that already. The file is just the base64 data though. |
Are you sure the file format (e.g. |
Yes I also tried JPG and jpeg file and corresponding data URLs, but it still yields (sometimes human readable, but gibberish results). |
Is it possible for you to share a HTTP link to the image so I can test it? |
Sure please find the demo image attached. The base_str file from above is the corresponding base64 string. |
Quick question: Do you get similar issues using the original HF model, or only on the quantized model? Can you show the command you used to serve vLLM? |
I did not try the original HF model, but I thought it should not be a quantization issue, since a image url works |
My docker run command is: docker run -d --runtime nvidia --gpus all -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --ipc=host pixtral-vllm-4bit and in the CMD of my dockerfile I have: ["--model", "SeanScripts/pixtral-12b-nf4", "--quantization", "bitsandbytes", "--trust-remote-code", "--load-format", "bitsandbytes", "--max-model-len", "8192", "--served-model-name", "pixtral", "--chat-template", ".."] |
What is this chat template that you're using? From my understanding, the model should already define one so there is no need to override it. |
It is a custom one, since the chat template can't be inferred, because the model is not auto recognized because it is not main repo but a quantized one on HF. So I also tested the url for the picture and the base64 output is consistent with output of the image url. So the image url and base64 probably also seems to work for an example image like https://picsum.photos/id/237/400/300 but not for my image. Are there specific requirements for the image? |
Can confirm that https://picsum.photos/id/237/400/300 also works as a base64 string |
It is possible that the performance of the model may not be consistent for very detailed images, especially since you're using a quantized model. |
But, there should be no difference between base64 and HTTP URL. Can you set the temperature to zero and see if the outputs are the same for your image? |
the outputs are the same for the other example images and also for my image. For the example image it works well (reasonable answer), but not for my image. |
"It is possible that the performance of the model may not be consistent for very detailed images, especially since you're using a quantized model." No i tested the quantized model also in a notebook (no vllm) and it worked fine on that image |
So for smaller, simpler images it seems to be work. Maybe it is tied to the way vLLM dequantizes. I am not entirely sure, that it works perfectly for SeanScripts/pixtral-12b-nf4. The performance is definitely not consistent with running the quantized model with HF though. |
@mgoin could you offer some insights on this? |
See #11816 Can you try setting |
Alternatively, pass |
Your current environment
The output of `python collect_env.py`
Model Input Dumps
No response
🐛 Describe the bug
I use the following snippet https://huggingface.co/mistralai/Pixtral-12B-2409/discussions/6 to create a base64 string which is sent as a payload to a docker container which is spin up from the latest vllm-image. I run the following model: https://huggingface.co/SeanScripts/pixtral-12b-nf4 and only get gibberish as model output. Everything is run within an EC2 server with a g5.2xlarge VM (A10 GPU).
If I don't pass the base64 string but a regular image url, everything works as intended. The prompt seems to be correctly formatted.
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: