[Bug]: PixtralHF accuracy on MMMU regressed since 0.6.4.post1 #11816

mgoin · 2025-01-07T20:18:51Z

Your current environment

The output of `python collect_env.py`

Your output of `python collect_env.py` here

Model Input Dumps

No response

🐛 Describe the bug

It seems to be that pixtral_hf accuracy has been affected since the last known good result from 0.6.4.post1.

Reference results on HF model card, we will look at `MMMU (CoT) ~= 51%. Evals ran using mistral-evals

vLLM 0.6.4.post1, server and eval:

> uv pip install vllm==0.6.4.post1
> vllm serve nm-testing/pixtral-12b-FP8-dynamic --max-num-seqs 30 --max-model-len 30000 --limit-mm-per-prompt image=5 --port 9000

> python -m eval.run eval_vllm --model_name nm-testing/pixtral-12b-FP8-dynamic --url http://0.0.0.0:9000 --output_dir output/ --eval_name "mmmu"
...
================================================================================
Metrics:
{
    "explicit_prompt_relaxed_correctness": 0.5044444444444445,
    "anywhere_in_answer_relaxed_correctness": 0.5044444444444445
}
================================================================================

vLLM 0.6.5, server and eval:

> uv pip install vllm==0.6.5
> vllm serve nm-testing/pixtral-12b-FP8-dynamic --max-num-seqs 30 --max-model-len 30000 --limit-mm-per-prompt image=5 --port 9000

> python -m eval.run eval_vllm --model_name nm-testing/pixtral-12b-FP8-dynamic --url http://0.0.0.0:9000 --output_dir output/ --eval_name "mmmu"
...
================================================================================
Metrics:
{
    "explicit_prompt_relaxed_correctness": 0.0011111111111111111,
    "anywhere_in_answer_relaxed_correctness": 0.3466666666666667
}
================================================================================

vLLM using #11741, server and eval:

> uv pip install vllm==0.6.5
> vllm serve nm-testing/pixtral-12b-FP8-dynamic --max-num-seqs 30 --max-model-len 30000 --limit-mm-per-prompt image=5 --port 9000

> python -m eval.run eval_vllm --model_name nm-testing/pixtral-12b-FP8-dynamic --url http://0.0.0.0:9000 --output_dir output/ --eval_name "mmmu"
...
================================================================================
Metrics:
{
    "explicit_prompt_relaxed_correctness": 0.0011111111111111111,
    "anywhere_in_answer_relaxed_correctness": 0.3466666666666667
}
================================================================================

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

DarkLight1337 · 2025-01-08T08:22:49Z

Bisection results (updated as I make more progress):

#10347 PASS
#10371 PASS
#9919 <--
#10386 FAIL
#10361 FAIL
#10415 FAIL
#10180 FAIL
#10128 FAIL
#10973 FAIL

It appears that the chat template content format for Pixtral-HF is parsed as openai format ~~instead of string format~~. Upon further inspection, the chat template is indeed in openai format. Looking into why that results in incorrect output...

DarkLight1337 · 2025-01-09T09:25:25Z

I found that the chat template actually has a typo in it.

          {%- if message["content"] is not string %}
              {%- for chunk in message["content"] %}
                  {%- if chunk["type"] == "text" %}
-                     {{- chunk["content"] }}
+                     {{- chunk["text"] }}
                  {%- elif chunk["type"] == "image" %}
                      {{- "[IMG]" }}
                  {%- else %}
                      {{- raise_exception("Unrecognized content type!") }}
                  {%- endif %}
              {%- endfor %}
          {%- else %}
              {{- message["content"] }}
          {%- endif %}

To be compatible with OpenAI schema, the inner key should be text, not content.

Update: Reposted this on a similar thread on Pixtral-HF repo.

mgoin added the bug Something isn't working label Jan 7, 2025

DarkLight1337 mentioned this issue Jan 9, 2025

[Bug]: base64 string leads to gibberish with latest vLLM server and pixtral-12b #11781

Closed

1 task

DarkLight1337 mentioned this issue Jan 9, 2025

[Misc] Provide correct Pixtral-HF chat template #11891

Merged

mgoin closed this as completed in #11891 Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: PixtralHF accuracy on MMMU regressed since 0.6.4.post1 #11816

[Bug]: PixtralHF accuracy on MMMU regressed since 0.6.4.post1 #11816

mgoin commented Jan 7, 2025

DarkLight1337 commented Jan 8, 2025 •

edited

Loading

DarkLight1337 commented Jan 9, 2025 •

edited

Loading

[Bug]: PixtralHF accuracy on MMMU regressed since 0.6.4.post1 #11816

[Bug]: PixtralHF accuracy on MMMU regressed since 0.6.4.post1 #11816

Comments

mgoin commented Jan 7, 2025

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

DarkLight1337 commented Jan 8, 2025 • edited Loading

DarkLight1337 commented Jan 9, 2025 • edited Loading

DarkLight1337 commented Jan 8, 2025 •

edited

Loading

DarkLight1337 commented Jan 9, 2025 •

edited

Loading