diff --git a/docs/source/serving/openai_compatible_server.md b/docs/source/serving/openai_compatible_server.md index 23c66f72162d2..caf5e8cafd9aa 100644 --- a/docs/source/serving/openai_compatible_server.md +++ b/docs/source/serving/openai_compatible_server.md @@ -112,7 +112,13 @@ completion = client.chat.completions.create( ## Extra HTTP Headers -Only `X-Request-Id` HTTP request header is supported for now. +Only `X-Request-Id` HTTP request header is supported for now. It can be enabled +with `--enable-request-id-headers`. + +> Note that enablement of the headers can impact performance significantly at high QPS +> rates. We recommend implementing HTTP headers at the router level (e.g. via Istio), +> rather than within the vLLM layer for this reason. +> See https://github.com/vllm-project/vllm/pull/11529 for more details. ```python completion = client.chat.completions.create(