Whisper support #180

gottlike · 2023-06-21T07:06:07Z

Is support for Whisper on the roadmap? Something like https://github.com/ggerganov/whisper.cpp would be great.

zhuohan123 · 2023-06-21T14:36:36Z

Supporting encoder-decoder models is in our roadmap as mentioned in #187. Feel free to join the discussion and potentially contribute!

libratiger · 2023-09-14T09:22:29Z

+1 for this feature

silvacarl2 · 2023-09-23T14:52:44Z

+2 for this feature

xtqxk · 2023-10-24T03:19:46Z

+3 for this feature

arun2728 · 2023-12-01T04:25:17Z

+4 for this feature

SinanAkkoyun · 2023-12-15T08:39:52Z

+555

Swiffers · 2024-01-02T18:41:45Z

+1

hahazei · 2024-02-26T09:43:01Z

+1

binarycrayon · 2024-02-26T21:37:07Z

monitoring

afeldman-nm · 2024-02-28T20:16:05Z

@zhuohan123 I am working on Whisper support.

silvacarl2 · 2024-02-28T20:20:11Z

NO WAY!!!!!!!!!!!!!!!!!!! THAT WILL BE AWESOME!!!!!!!!!!!!!!!!!!!!!

libratiger · 2024-03-04T02:34:14Z

I am working on this PR, and will soon submit the draft.

silvacarl2 · 2024-03-04T16:33:01Z

THIS IS GOING TO BE HUGE, THX!

dbogunowicz · 2024-03-12T15:44:29Z

Hey @libratiger, together with @afeldman-nm I am now working full-time on the same target. Would you like to sync? It would be more efficient to share knowledge, rather than develop the same thing in two silos.

libratiger · 2024-03-13T02:32:21Z

You're right. I've just discovered a discussion about T5 #187 (comment) , where there are differing opinions on the encoder-decoder model. Perhaps it will improve after that PR is merged?

dbogunowicz · 2024-03-13T12:27:38Z

@libratiger the current status is as follows: neural magic has finalized the original T5 PR, and we are now benchmarking the solution. In parallel, we are also developing support for Whisperer.

JackZeng · 2024-03-28T08:47:32Z

@dbogunowicz any update on this issue? looking forward

dbogunowicz · 2024-03-28T13:02:04Z

Hi! I am working on the Whisper on our team fork: neuralmagic#147
The status is: I am running the inference (both prompt prefill as well as autoregressive inference), but I get correctness issues, most likely caused by the erroneous attention mask implementation.

junior-zsy · 2024-04-02T10:58:03Z

@dbogunowicz I ran the feature/demian/Whisper branch to run the Whisper model and found an error message: vllm/worker/model_runner. py, line 477, in prepare_decode
Multi_modeal_input)
NameError: name 'multi_modal_input' is not defined, code execution cannot start

dbogunowicz · 2024-04-02T12:22:41Z

@junior-zsy fixed for now. Please remember, that we are still working on that PR, so it's pretty much in WiP state. Let me explicitly set the appropriate PR flag.

junior-zsy · 2024-04-03T02:14:14Z

@dbogunowicz Ok, thank you. Hope it can be used soon

silvacarl2 · 2024-04-03T13:57:19Z

same here, this is going to be really cool!

afeldman-nm · 2024-04-03T14:11:53Z

@dbogunowicz thanks for your work on Whisper! Since there is clearly interest in this feature and its completion timeline, I want to add the context that Whisper support takes a dependency on encoder/decoder support -

Issue: #187
PR: #3117

which is also WIP (currently works partially but is not quite complete.) I expect to complete encoder/decoder support soon. JFYI for anyone interested in timelines.

dwoodworth90 · 2024-04-26T08:04:34Z

+1

afeldman-nm · 2024-04-30T13:55:55Z

See the encoder/decoder support issue (#187) and new PR (#4289) for a status update on encoder/decoder support, which is a prereq for Whisper support.

twicer-is-coder · 2024-05-21T09:16:12Z

Hi, any update on serving faster-whisper via VLLM?

afeldman-nm · 2024-05-23T17:26:52Z

Hi, any update on serving faster-whisper via VLLM?

Hi @twicer-is-coder ,

Whisper (or any variant thereof) is high of the list of models to add once infrastructure support is in; you can see the roadmap for infrastructure support in this PR:

#4942

afeldman-nm · 2024-08-09T21:28:06Z

FYI, encoder decoder support landed in #4942 and there is an RFC ( #7366 ) for follow-on encoder/decoder-related tasks, including adding Whisper support; feedback period is until August 16th. See #187 (comment)

silvacarl2 · 2024-08-09T21:32:00Z

are you kidding me? is whisper supported now by vllm?

afeldman-nm · 2024-08-09T21:40:08Z

are you kidding me? is whisper supported now by vllm?

Adding Whisper support will hopefully follow shortly now that we have the encoder/decoder infrastructure landed. This is part of the RFC.

silvacarl2 · 2024-08-09T23:42:43Z

DUDE THIS WILL BE HUGE

Jeevi10 · 2024-08-13T13:55:08Z

I am waiting for this update !!!!

Temirulan · 2024-08-24T04:41:25Z

Waiting for this support more than GTA

arynaq · 2024-09-15T12:53:19Z

Do we have any estimates on roadmap/timing on this? Much sought after by us too :)

hmellor · 2024-09-24T15:44:30Z

See:

[RFC]: Encoder/decoder models & feature compatibility #7366 for the RFC on encoder-decoder models
Whisper support #5964 for the PR adding initial Whisper support

ArmykOliva · 2024-10-08T10:21:51Z

Can I help implementing this feature? If someone is already implementing this, let me know, and I will assist you in any coding way possible.

…herry-pick-179-to-release [release] Retrigger build

Disable torch compile

ArmykOliva · 2024-11-07T00:56:29Z

How is this feature looking?

sbaby171 · 2025-01-15T21:40:22Z

i am seeing errors when trying to run the vLLM offline example of Whisper:

https://docs.vllm.ai/en/latest/getting_started/examples/whisper.html

This is the error I am seeing:

Traceback (most recent call last):
  File "/home/ubuntu/clip-and-whisper/test-whisper-offline.py", line 7, in <module>
    llm = LLM(
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/utils.py", line 986, in inner
    return fn(*args, **kwargs)
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 230, in __init__
    self.llm_engine = self.engine_class.from_engine_args(
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 514, in from_engine_args
    engine_config = engine_args.create_engine_config(usage_context)
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 1044, in create_engine_config
    model_config = self.create_model_config()
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 970, in create_model_config
    return ModelConfig(
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/config.py", line 337, in __init__
    self.multimodal_config = self._init_multimodal_config(
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/config.py", line 392, in _init_multimodal_config
    if ModelRegistry.is_multimodal_model(architectures):
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 461, in is_multimodal_model
    model_cls, _ = self.inspect_model_cls(architectures)
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 421, in inspect_model_cls
    return self._raise_for_unsupported(architectures)
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 382, in _raise_for_unsupported
    raise ValueError(
ValueError: Model architectures ['WhisperForConditionalGeneration'] are not supported for now. Supported architectures: dict_keys(['AquilaModel', 'AquilaForCausalLM', 'ArcticForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'CohereForCausalLM', 'Cohere2ForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'DeepseekV3ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GlmForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'GraniteForCausalLM', 'GraniteMoeForCausalLM', 'GritLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'InternLM2VEForCausalLM', 'JAISLMHeadModel', 'JambaForCausalLM', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MambaForCausalLM', 'FalconMambaForCausalLM', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'Olmo2ForCausalLM', 'OlmoeForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'Phi3SmallForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'SolarForCausalLM', 'TeleChat2ForCausalLM', 'XverseForCausalLM', 'BartModel', 'BartForConditionalGeneration', 'Florence2ForConditionalGeneration', 'BertModel', 'RobertaModel', 'RobertaForMaskedLM', 'XLMRobertaModel', 'Gemma2Model', 'JambaForSequenceClassification', 'LlamaModel', 'MistralModel', 'Qwen2Model', 'Qwen2ForRewardModel', 'LlavaNextForConditionalGeneration', 'Phi3VForCausalLM', 'Qwen2VLForConditionalGeneration', 'Qwen2ForSequenceClassification', 'BertForSequenceClassification', 'RobertaForSequenceClassification', 'XLMRobertaForSequenceClassification', 'AriaForConditionalGeneration', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'FuyuForCausalLM', 'H2OVLChatModel', 'InternVLChatModel', 'Idefics3ForConditionalGeneration', 'LlavaForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'LlavaOnevisionForConditionalGeneration', 'MantisForConditionalGeneration', 'MiniCPMV', 'MolmoForCausalLM', 'NVLM_D', 'PaliGemmaForConditionalGeneration', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'Qwen2AudioForConditionalGeneration', 'UltravoxModel', 'MllamaForConditionalGeneration', 'EAGLEModel', 'MedusaModel', 'MLPSpeculatorPreTrainedModel'])

pip freeze:

aiohappyeyeballs==2.4.4
aiohttp==3.11.11
aiohttp-cors==0.7.0
aiosignal==1.3.2
airportsdata==20241001
annotated-types==0.7.0
anyio==4.8.0
astor==0.8.1
async-timeout==5.0.1
attrs==24.3.0
blake3==1.0.2
cachetools==5.5.0
certifi==2024.12.14
charset-normalizer==3.4.1
click==8.1.8
cloudpickle==3.1.1
colorful==0.5.6
compressed-tensors==0.8.1
depyf==0.18.0
dill==0.3.9
diskcache==5.6.3
distlib==0.3.9
distro==1.9.0
einops==0.8.0
exceptiongroup==1.2.2
fastapi==0.115.6
filelock==3.16.1
frozenlist==1.5.0
fsspec==2024.12.0
gguf==0.10.0
google-api-core==2.24.0
google-auth==2.37.0
googleapis-common-protos==1.66.0
grpcio==1.69.0
h11==0.14.0
httpcore==1.0.7
httptools==0.6.4
httpx==0.28.1
huggingface-hub==0.27.1
idna==3.10
importlib_metadata==8.5.0
iniconfig==2.0.0
inquirerpy==0.3.4
interegular==0.3.3
Jinja2==3.1.5
jiter==0.8.2
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
lark==1.2.2
linkify-it-py==2.0.3
lm-format-enforcer==0.10.9
markdown-it-py==3.0.0
MarkupSafe==3.0.2
mdit-py-plugins==0.4.2
mdurl==0.1.2
memray==1.15.0
mistral_common==1.5.1
mpmath==1.3.0
msgpack==1.1.0
msgspec==0.19.0
multidict==6.1.0
nest-asyncio==1.6.0
networkx==3.4.2
numpy==1.26.4
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-ml-py==12.560.30
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
openai==1.59.7
opencensus==0.11.4
opencensus-context==0.1.3
opencv-python-headless==4.10.0.84
outlines==0.1.11
outlines_core==0.1.26
packaging==24.2
partial-json-parser==0.2.1.1.post5
pfzy==0.3.4
pillow==10.4.0
platformdirs==4.3.6
pluggy==1.5.0
prometheus-fastapi-instrumentator==7.0.2
prometheus_client==0.21.1
prompt_toolkit==3.0.48
propcache==0.2.1
proto-plus==1.25.0
protobuf==5.29.3
psutil==6.1.1
py-cpuinfo==9.0.0
py-spy==0.4.0
pyasn1==0.6.1
pyasn1_modules==0.4.1
pybind11==2.13.6
pycountry==24.6.1
pydantic==2.10.5
pydantic_core==2.27.2
Pygments==2.19.1
pytest==8.3.4
python-dotenv==1.0.1
PyYAML==6.0.2
pyzmq==26.2.0
ray==2.40.0
referencing==0.35.1
regex==2024.11.6
requests==2.32.3
rich==13.9.4
rpds-py==0.22.3
rsa==4.9
safetensors==0.5.2
sentencepiece==0.2.0
six==1.17.0
smart-open==7.1.0
sniffio==1.3.1
starlette==0.41.3
sympy==1.13.1
textual==1.0.0
tiktoken==0.7.0
tokenizers==0.21.0
tomli==2.2.1
torch==2.5.1
torchvision==0.20.1
tqdm==4.67.1
transformers==4.48.0
triton==3.1.0
typing_extensions==4.12.2
uc-micro-py==1.0.3
urllib3==2.3.0
uvicorn==0.34.0
uvloop==0.21.0
virtualenv==20.29.0
vllm==0.6.6.post1
watchfiles==1.0.4
wcwidth==0.2.13
websockets==14.1
wrapt==1.17.2
xformers==0.0.28.post3
xgrammar==0.1.9
yarl==1.18.3
zipp==3.21.0

hmellor · 2025-01-15T23:03:12Z

0.6.6.post1 does not support Whisper. Support was added by #11280 2 weeks ago, which is after 0.6.6.post1 was released. To use Whisper you must either install from main or wait for the next release.

sbaby171 · 2025-01-16T22:32:20Z

i am seeing errors when trying to run the vLLM offline example of Whisper:

https://docs.vllm.ai/en/latest/getting_started/examples/whisper.html

This is the error I am seeing:

Traceback (most recent call last):
  File "/home/ubuntu/clip-and-whisper/test-whisper-offline.py", line 7, in <module>
    llm = LLM(
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/utils.py", line 986, in inner
    return fn(*args, **kwargs)
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 230, in __init__
    self.llm_engine = self.engine_class.from_engine_args(
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 514, in from_engine_args
    engine_config = engine_args.create_engine_config(usage_context)
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 1044, in create_engine_config
    model_config = self.create_model_config()
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 970, in create_model_config
    return ModelConfig(
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/config.py", line 337, in __init__
    self.multimodal_config = self._init_multimodal_config(
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/config.py", line 392, in _init_multimodal_config
    if ModelRegistry.is_multimodal_model(architectures):
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 461, in is_multimodal_model
    model_cls, _ = self.inspect_model_cls(architectures)
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 421, in inspect_model_cls
    return self._raise_for_unsupported(architectures)
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 382, in _raise_for_unsupported
    raise ValueError(
ValueError: Model architectures ['WhisperForConditionalGeneration'] are not supported for now. Supported architectures: dict_keys(['AquilaModel', 'AquilaForCausalLM', 'ArcticForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'CohereForCausalLM', 'Cohere2ForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'DeepseekV3ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GlmForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'GraniteForCausalLM', 'GraniteMoeForCausalLM', 'GritLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'InternLM2VEForCausalLM', 'JAISLMHeadModel', 'JambaForCausalLM', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MambaForCausalLM', 'FalconMambaForCausalLM', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'Olmo2ForCausalLM', 'OlmoeForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'Phi3SmallForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'SolarForCausalLM', 'TeleChat2ForCausalLM', 'XverseForCausalLM', 'BartModel', 'BartForConditionalGeneration', 'Florence2ForConditionalGeneration', 'BertModel', 'RobertaModel', 'RobertaForMaskedLM', 'XLMRobertaModel', 'Gemma2Model', 'JambaForSequenceClassification', 'LlamaModel', 'MistralModel', 'Qwen2Model', 'Qwen2ForRewardModel', 'LlavaNextForConditionalGeneration', 'Phi3VForCausalLM', 'Qwen2VLForConditionalGeneration', 'Qwen2ForSequenceClassification', 'BertForSequenceClassification', 'RobertaForSequenceClassification', 'XLMRobertaForSequenceClassification', 'AriaForConditionalGeneration', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'FuyuForCausalLM', 'H2OVLChatModel', 'InternVLChatModel', 'Idefics3ForConditionalGeneration', 'LlavaForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'LlavaOnevisionForConditionalGeneration', 'MantisForConditionalGeneration', 'MiniCPMV', 'MolmoForCausalLM', 'NVLM_D', 'PaliGemmaForConditionalGeneration', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'Qwen2AudioForConditionalGeneration', 'UltravoxModel', 'MllamaForConditionalGeneration', 'EAGLEModel', 'MedusaModel', 'MLPSpeculatorPreTrainedModel'])

pip freeze:

aiohappyeyeballs==2.4.4
aiohttp==3.11.11
aiohttp-cors==0.7.0
aiosignal==1.3.2
airportsdata==20241001
annotated-types==0.7.0
anyio==4.8.0
astor==0.8.1
async-timeout==5.0.1
attrs==24.3.0
blake3==1.0.2
cachetools==5.5.0
certifi==2024.12.14
charset-normalizer==3.4.1
click==8.1.8
cloudpickle==3.1.1
colorful==0.5.6
compressed-tensors==0.8.1
depyf==0.18.0
dill==0.3.9
diskcache==5.6.3
distlib==0.3.9
distro==1.9.0
einops==0.8.0
exceptiongroup==1.2.2
fastapi==0.115.6
filelock==3.16.1
frozenlist==1.5.0
fsspec==2024.12.0
gguf==0.10.0
google-api-core==2.24.0
google-auth==2.37.0
googleapis-common-protos==1.66.0
grpcio==1.69.0
h11==0.14.0
httpcore==1.0.7
httptools==0.6.4
httpx==0.28.1
huggingface-hub==0.27.1
idna==3.10
importlib_metadata==8.5.0
iniconfig==2.0.0
inquirerpy==0.3.4
interegular==0.3.3
Jinja2==3.1.5
jiter==0.8.2
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
lark==1.2.2
linkify-it-py==2.0.3
lm-format-enforcer==0.10.9
markdown-it-py==3.0.0
MarkupSafe==3.0.2
mdit-py-plugins==0.4.2
mdurl==0.1.2
memray==1.15.0
mistral_common==1.5.1
mpmath==1.3.0
msgpack==1.1.0
msgspec==0.19.0
multidict==6.1.0
nest-asyncio==1.6.0
networkx==3.4.2
numpy==1.26.4
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-ml-py==12.560.30
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
openai==1.59.7
opencensus==0.11.4
opencensus-context==0.1.3
opencv-python-headless==4.10.0.84
outlines==0.1.11
outlines_core==0.1.26
packaging==24.2
partial-json-parser==0.2.1.1.post5
pfzy==0.3.4
pillow==10.4.0
platformdirs==4.3.6
pluggy==1.5.0
prometheus-fastapi-instrumentator==7.0.2
prometheus_client==0.21.1
prompt_toolkit==3.0.48
propcache==0.2.1
proto-plus==1.25.0
protobuf==5.29.3
psutil==6.1.1
py-cpuinfo==9.0.0
py-spy==0.4.0
pyasn1==0.6.1
pyasn1_modules==0.4.1
pybind11==2.13.6
pycountry==24.6.1
pydantic==2.10.5
pydantic_core==2.27.2
Pygments==2.19.1
pytest==8.3.4
python-dotenv==1.0.1
PyYAML==6.0.2
pyzmq==26.2.0
ray==2.40.0
referencing==0.35.1
regex==2024.11.6
requests==2.32.3
rich==13.9.4
rpds-py==0.22.3
rsa==4.9
safetensors==0.5.2
sentencepiece==0.2.0
six==1.17.0
smart-open==7.1.0
sniffio==1.3.1
starlette==0.41.3
sympy==1.13.1
textual==1.0.0
tiktoken==0.7.0
tokenizers==0.21.0
tomli==2.2.1
torch==2.5.1
torchvision==0.20.1
tqdm==4.67.1
transformers==4.48.0
triton==3.1.0
typing_extensions==4.12.2
uc-micro-py==1.0.3
urllib3==2.3.0
uvicorn==0.34.0
uvloop==0.21.0
virtualenv==20.29.0
vllm==0.6.6.post1
watchfiles==1.0.4
wcwidth==0.2.13
websockets==14.1
wrapt==1.17.2
xformers==0.0.28.post3
xgrammar==0.1.9
yarl==1.18.3
zipp==3.21.0

FYI - i nuked my py vm, and installed the nightly:

pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly

thus,

vllm==0.6.6.post2.dev234+gebd8c669

it seems to be working now... it is currently downloading.

WoosukKwon added the new model Requests to new models label Jun 21, 2023

zhuohan123 mentioned this issue Jun 25, 2023

[Roadmap] vLLM Development Roadmap: H2 2023 #244

Closed

76 tasks

viktor-ferenczi mentioned this issue Sep 23, 2023

support whisper? #1152

Closed

afeldman-nm mentioned this issue Feb 28, 2024

Adding support for encoder-decoder models, like T5 or BART #187

Closed

This was referenced Apr 2, 2024

[WIP] Upstream encoder/decoder support based on multiple blocktables neuralmagic/nm-vllm#161

Closed

[WIP] Upstream encoder/decoder support based on multiple blocktables afeldman-nm/vllm#3

Open

afeldman-nm mentioned this issue May 21, 2024

[Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support) #4837

Merged

This was referenced Jul 18, 2024

Whisper support #5964

Closed

[RFC]: Multi-modality Support on vLLM #4194

Open

afeldman-nm mentioned this issue Aug 9, 2024

[RFC]: Encoder/decoder models & feature compatibility #7366

Open

matatonic mentioned this issue Sep 14, 2024

Insanely-fast-whisper support matatonic/openedai-whisper#3

Open

dtrifiro pushed a commit to dtrifiro/vllm that referenced this issue Oct 16, 2024

Merge pull request vllm-project#180 from openshift-cherrypick-robot/c…

387e54f

…herry-pick-179-to-release [release] Retrigger build

mht-sharma pushed a commit to mht-sharma/vllm that referenced this issue Oct 30, 2024

Support commandr on ROCm (vllm-project#180)

5cf1c75

Disable torch compile

aurickq mentioned this issue Dec 18, 2024

[Model] Whisper model implementation #11280

Merged

youkaichao closed this as completed in #11280 Jan 3, 2025

Whisper support #180

Whisper support #180

Comments

gottlike commented Jun 21, 2023 • edited Loading

zhuohan123 commented Jun 21, 2023

libratiger commented Sep 14, 2023

silvacarl2 commented Sep 23, 2023

xtqxk commented Oct 24, 2023

arun2728 commented Dec 1, 2023

SinanAkkoyun commented Dec 15, 2023

Swiffers commented Jan 2, 2024

hahazei commented Feb 26, 2024

binarycrayon commented Feb 26, 2024

afeldman-nm commented Feb 28, 2024

silvacarl2 commented Feb 28, 2024

libratiger commented Mar 4, 2024

silvacarl2 commented Mar 4, 2024

dbogunowicz commented Mar 12, 2024 • edited Loading

libratiger commented Mar 13, 2024

dbogunowicz commented Mar 13, 2024

JackZeng commented Mar 28, 2024

dbogunowicz commented Mar 28, 2024

junior-zsy commented Apr 2, 2024

dbogunowicz commented Apr 2, 2024

junior-zsy commented Apr 3, 2024

silvacarl2 commented Apr 3, 2024

afeldman-nm commented Apr 3, 2024 • edited Loading

dwoodworth90 commented Apr 26, 2024

afeldman-nm commented Apr 30, 2024 • edited Loading

twicer-is-coder commented May 21, 2024

afeldman-nm commented May 23, 2024

afeldman-nm commented Aug 9, 2024 • edited Loading

silvacarl2 commented Aug 9, 2024

afeldman-nm commented Aug 9, 2024 • edited Loading

silvacarl2 commented Aug 9, 2024

Jeevi10 commented Aug 13, 2024

Temirulan commented Aug 24, 2024

arynaq commented Sep 15, 2024

hmellor commented Sep 24, 2024

ArmykOliva commented Oct 8, 2024

ArmykOliva commented Nov 7, 2024

sbaby171 commented Jan 15, 2025

hmellor commented Jan 15, 2025

sbaby171 commented Jan 16, 2025

gottlike commented Jun 21, 2023 •

edited

Loading

dbogunowicz commented Mar 12, 2024 •

edited

Loading

afeldman-nm commented Apr 3, 2024 •

edited

Loading

afeldman-nm commented Apr 30, 2024 •

edited

Loading

afeldman-nm commented Aug 9, 2024 •

edited

Loading

afeldman-nm commented Aug 9, 2024 •

edited

Loading