Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Misc] Provide correct Pixtral-HF chat template #11891

Merged
merged 4 commits into from
Jan 9, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 34 additions & 27 deletions docs/source/models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -322,7 +322,7 @@ See [this page](#generative-models) for more information on how to use generativ
- ✅︎
- ✅︎
* - `Qwen2ForCausalLM`
- Qwen2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional misc fixes to the list of supported models

- QwQ, Qwen2
- `Qwen/QwQ-32B-Preview`, `Qwen/Qwen2-7B-Instruct`, `Qwen/Qwen2-7B`, etc.
- ✅︎
- ✅︎
Expand Down Expand Up @@ -436,7 +436,7 @@ loaded. See [relevant issue on HF Transformers](https://github.com/huggingface/t
```

If your model is not in the above list, we will try to automatically convert the model using
{func}`vllm.model_executor.models.adapters.as_embedding_model`. By default, the embeddings
{func}`~vllm.model_executor.models.adapters.as_embedding_model`. By default, the embeddings
of the whole prompt are extracted from the normalized hidden state corresponding to the last token.

#### Reward Modeling (`--task reward`)
Expand Down Expand Up @@ -468,7 +468,7 @@ of the whole prompt are extracted from the normalized hidden state corresponding
```

If your model is not in the above list, we will try to automatically convert the model using
{func}`vllm.model_executor.models.adapters.as_reward_model`. By default, we return the hidden states of each token directly.
{func}`~vllm.model_executor.models.adapters.as_reward_model`. By default, we return the hidden states of each token directly.

```{important}
For process-supervised reward models such as `peiyi9979/math-shepherd-mistral-7b-prm`, the pooling config should be set explicitly,
Expand Down Expand Up @@ -499,7 +499,7 @@ e.g.: `--override-pooler-config '{"pooling_type": "STEP", "step_tag_id": 123, "r
```

If your model is not in the above list, we will try to automatically convert the model using
{func}`vllm.model_executor.models.adapters.as_classification_model`. By default, the class probabilities are extracted from the softmaxed hidden state corresponding to the last token.
{func}`~vllm.model_executor.models.adapters.as_classification_model`. By default, the class probabilities are extracted from the softmaxed hidden state corresponding to the last token.

#### Sentence Pair Scoring (`--task score`)

Expand Down Expand Up @@ -550,6 +550,28 @@ On the other hand, modalities separated by `/` are mutually exclusive.

See [this page](#multimodal-inputs) on how to pass multi-modal inputs to the model.

````{important}
To enable multiple multi-modal items per text prompt, you have to set `limit_mm_per_prompt` (offline inference)
or `--limit-mm-per-prompt` (online inference). For example, to enable passing up to 4 images per text prompt:
Offline inference:
```python
llm = LLM(
model="Qwen/Qwen2-VL-7B-Instruct",
limit_mm_per_prompt={"image": 4},
)
```
Online inference:
```bash
vllm serve Qwen/Qwen2-VL-7B-Instruct --limit-mm-per-prompt image=4
```
````

```{note}
vLLM currently only supports adding LoRA to the language backbone of multimodal models.
```

### Generative Models

See [this page](#generative-models) for more information on how to use generative models.
Expand Down Expand Up @@ -689,14 +711,14 @@ See [this page](#generative-models) for more information on how to use generativ
* - `Phi3VForCausalLM`
- Phi-3-Vision, Phi-3.5-Vision
- T + I<sup>E+</sup>
- `microsoft/Phi-3-vision-128k-instruct`, `microsoft/Phi-3.5-vision-instruct` etc.
- `microsoft/Phi-3-vision-128k-instruct`, `microsoft/Phi-3.5-vision-instruct`, etc.
-
- ✅︎
- ✅︎
* - `PixtralForConditionalGeneration`
- Pixtral
- T + I<sup>+</sup>
- `mistralai/Pixtral-12B-2409`, `mistral-community/pixtral-12b` etc.
- `mistralai/Pixtral-12B-2409`, `mistral-community/pixtral-12b` (see note), etc.
-
- ✅︎
- ✅︎
Expand All @@ -715,7 +737,7 @@ See [this page](#generative-models) for more information on how to use generativ
- ✅︎
- ✅︎
* - `Qwen2VLForConditionalGeneration`
- Qwen2-VL
- QVQ, Qwen2-VL
- T + I<sup>E+</sup> + V<sup>E+</sup>
- `Qwen/QVQ-72B-Preview`, `Qwen/Qwen2-VL-7B-Instruct`, `Qwen/Qwen2-VL-72B-Instruct`, etc.
- ✅︎
Expand All @@ -733,26 +755,6 @@ See [this page](#generative-models) for more information on how to use generativ
<sup>E</sup> Pre-computed embeddings can be inputted for this modality.
<sup>+</sup> Multiple items can be inputted per text prompt for this modality.

````{important}
To enable multiple multi-modal items per text prompt, you have to set `limit_mm_per_prompt` (offline inference)
or `--limit-mm-per-prompt` (online inference). For example, to enable passing up to 4 images per text prompt:
```python
llm = LLM(
model="Qwen/Qwen2-VL-7B-Instruct",
limit_mm_per_prompt={"image": 4},
)
```
```bash
vllm serve Qwen/Qwen2-VL-7B-Instruct --limit-mm-per-prompt image=4
```
````

```{note}
vLLM currently only supports adding LoRA to the language backbone of multimodal models.
```

```{note}
To use `TIGER-Lab/Mantis-8B-siglip-llama3`, you have pass `--hf_overrides '{"architectures": ["MantisForConditionalGeneration"]}'` when running vLLM.
```
Expand All @@ -762,6 +764,11 @@ The official `openbmb/MiniCPM-V-2` doesn't work yet, so we need to use a fork (`
For more details, please see: <gh-pr:4087#issuecomment-2250397630>
```

```{note}
The chat template for Pixtral-HF is incorrect (see [discussion](https://huggingface.co/mistral-community/pixtral-12b/discussions/22)).
A corrected version is available at <gh-file:examples/template_pixtral_hf.jinja>.
```

### Pooling Models

See [this page](pooling-models) for more information on how to use pooling models.
Expand Down
38 changes: 38 additions & 0 deletions examples/template_pixtral_hf.jinja
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
{%- if messages[0]["role"] == "system" %}
{%- set system_message = messages[0]["content"] %}
{%- set loop_messages = messages[1:] %}
{%- else %}
{%- set loop_messages = messages %}
{%- endif %}

{{- bos_token }}
{%- for message in loop_messages %}
{%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}
{{- raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}
{%- endif %}
{%- if message["role"] == "user" %}
{%- if loop.last and system_message is defined %}
{{- "[INST]" + system_message + "\n" }}
{%- else %}
{{- "[INST]" }}
{%- endif %}
{%- if message["content"] is not string %}
{%- for chunk in message["content"] %}
{%- if chunk["type"] == "text" %}
{{- chunk["text"] }}
{%- elif chunk["type"] == "image" %}
{{- "[IMG]" }}
{%- else %}
{{- raise_exception("Unrecognized content type!") }}
{%- endif %}
{%- endfor %}
{%- else %}
{{- message["content"] }}
{%- endif %}
{{- "[/INST]" }}
{%- elif message["role"] == "assistant" %}
{{- message["content"] + eos_token}}
{%- else %}
{{- raise_exception("Only user and assistant roles are supported, with the exception of an initial optional system message!") }}
{%- endif %}
{%- endfor %}
1 change: 1 addition & 0 deletions tests/entrypoints/test_chat_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -758,6 +758,7 @@ def test_resolve_content_format_hf_defined(model, expected_format):
("template_falcon.jinja", "string"),
("template_inkbot.jinja", "string"),
("template_llava.jinja", "string"),
("template_pixtral_hf.jinja", "openai"),
("template_vlm2vec.jinja", "openai"),
("tool_chat_template_granite_20b_fc.jinja", "string"),
("tool_chat_template_hermes.jinja", "string"),
Expand Down
Loading