Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whisper support #180

Closed
gottlike opened this issue Jun 21, 2023 · 40 comments · Fixed by #11280
Closed

Whisper support #180

gottlike opened this issue Jun 21, 2023 · 40 comments · Fixed by #11280
Labels
new model Requests to new models

Comments

@gottlike
Copy link
Contributor

gottlike commented Jun 21, 2023

Is support for Whisper on the roadmap? Something like https://github.com/ggerganov/whisper.cpp would be great.

@WoosukKwon WoosukKwon added the new model Requests to new models label Jun 21, 2023
@zhuohan123
Copy link
Member

Supporting encoder-decoder models is in our roadmap as mentioned in #187. Feel free to join the discussion and potentially contribute!

@libratiger
Copy link

+1 for this feature

@silvacarl2
Copy link

+2 for this feature

@xtqxk
Copy link

xtqxk commented Oct 24, 2023

+3 for this feature

@arun2728
Copy link

arun2728 commented Dec 1, 2023

+4 for this feature

@SinanAkkoyun
Copy link

+555

@Swiffers
Copy link

Swiffers commented Jan 2, 2024

+1

@hahazei
Copy link

hahazei commented Feb 26, 2024

+1

@binarycrayon
Copy link

monitoring

@afeldman-nm
Copy link
Contributor

@zhuohan123 I am working on Whisper support.

@silvacarl2
Copy link

NO WAY!!!!!!!!!!!!!!!!!!! THAT WILL BE AWESOME!!!!!!!!!!!!!!!!!!!!!

@libratiger
Copy link

I am working on this PR, and will soon submit the draft.

@silvacarl2
Copy link

THIS IS GOING TO BE HUGE, THX!

@dbogunowicz
Copy link

dbogunowicz commented Mar 12, 2024

Hey @libratiger, together with @afeldman-nm I am now working full-time on the same target. Would you like to sync? It would be more efficient to share knowledge, rather than develop the same thing in two silos.

@libratiger
Copy link

You're right. I've just discovered a discussion about T5 #187 (comment) , where there are differing opinions on the encoder-decoder model. Perhaps it will improve after that PR is merged?

@dbogunowicz
Copy link

@libratiger the current status is as follows: neural magic has finalized the original T5 PR, and we are now benchmarking the solution. In parallel, we are also developing support for Whisperer.

@JackZeng
Copy link

@dbogunowicz any update on this issue? looking forward

@dbogunowicz
Copy link

Hi! I am working on the Whisper on our team fork: neuralmagic#147
The status is: I am running the inference (both prompt prefill as well as autoregressive inference), but I get correctness issues, most likely caused by the erroneous attention mask implementation.

@junior-zsy
Copy link

@dbogunowicz I ran the feature/demian/Whisper branch to run the Whisper model and found an error message: vllm/worker/model_runner. py, line 477, in prepare_decode
Multi_modeal_input)
NameError: name 'multi_modal_input' is not defined, code execution cannot start

@dbogunowicz
Copy link

@junior-zsy fixed for now. Please remember, that we are still working on that PR, so it's pretty much in WiP state. Let me explicitly set the appropriate PR flag.

@junior-zsy
Copy link

@dbogunowicz Ok, thank you. Hope it can be used soon

@silvacarl2
Copy link

same here, this is going to be really cool!

@afeldman-nm
Copy link
Contributor

afeldman-nm commented Apr 3, 2024

@dbogunowicz thanks for your work on Whisper! Since there is clearly interest in this feature and its completion timeline, I want to add the context that Whisper support takes a dependency on encoder/decoder support -

Issue: #187
PR: #3117

which is also WIP (currently works partially but is not quite complete.) I expect to complete encoder/decoder support soon. JFYI for anyone interested in timelines.

@dwoodworth90
Copy link

+1

@afeldman-nm
Copy link
Contributor

afeldman-nm commented Apr 30, 2024

See the encoder/decoder support issue (#187) and new PR (#4289) for a status update on encoder/decoder support, which is a prereq for Whisper support.

@twicer-is-coder
Copy link

Hi, any update on serving faster-whisper via VLLM?

@afeldman-nm
Copy link
Contributor

Hi, any update on serving faster-whisper via VLLM?

Hi @twicer-is-coder ,

Whisper (or any variant thereof) is high of the list of models to add once infrastructure support is in; you can see the roadmap for infrastructure support in this PR:

#4942

@afeldman-nm
Copy link
Contributor

afeldman-nm commented Aug 9, 2024

FYI, encoder decoder support landed in #4942 and there is an RFC ( #7366 ) for follow-on encoder/decoder-related tasks, including adding Whisper support; feedback period is until August 16th. See #187 (comment)

@silvacarl2
Copy link

are you kidding me? is whisper supported now by vllm?

@afeldman-nm
Copy link
Contributor

afeldman-nm commented Aug 9, 2024

are you kidding me? is whisper supported now by vllm?

Adding Whisper support will hopefully follow shortly now that we have the encoder/decoder infrastructure landed. This is part of the RFC.

@silvacarl2
Copy link

DUDE THIS WILL BE HUGE

@Jeevi10
Copy link

Jeevi10 commented Aug 13, 2024

I am waiting for this update !!!!

@Temirulan
Copy link

Waiting for this support more than GTA

@arynaq
Copy link

arynaq commented Sep 15, 2024

Do we have any estimates on roadmap/timing on this? Much sought after by us too :)

@hmellor
Copy link
Collaborator

hmellor commented Sep 24, 2024

See:

@ArmykOliva
Copy link

Can I help implementing this feature? If someone is already implementing this, let me know, and I will assist you in any coding way possible.

dtrifiro pushed a commit to dtrifiro/vllm that referenced this issue Oct 16, 2024
…herry-pick-179-to-release

[release] Retrigger build
mht-sharma pushed a commit to mht-sharma/vllm that referenced this issue Oct 30, 2024
@ArmykOliva
Copy link

How is this feature looking?

@sbaby171
Copy link

i am seeing errors when trying to run the vLLM offline example of Whisper:

https://docs.vllm.ai/en/latest/getting_started/examples/whisper.html

This is the error I am seeing:

Traceback (most recent call last):
  File "/home/ubuntu/clip-and-whisper/test-whisper-offline.py", line 7, in <module>
    llm = LLM(
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/utils.py", line 986, in inner
    return fn(*args, **kwargs)
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 230, in __init__
    self.llm_engine = self.engine_class.from_engine_args(
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 514, in from_engine_args
    engine_config = engine_args.create_engine_config(usage_context)
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 1044, in create_engine_config
    model_config = self.create_model_config()
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 970, in create_model_config
    return ModelConfig(
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/config.py", line 337, in __init__
    self.multimodal_config = self._init_multimodal_config(
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/config.py", line 392, in _init_multimodal_config
    if ModelRegistry.is_multimodal_model(architectures):
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 461, in is_multimodal_model
    model_cls, _ = self.inspect_model_cls(architectures)
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 421, in inspect_model_cls
    return self._raise_for_unsupported(architectures)
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 382, in _raise_for_unsupported
    raise ValueError(
ValueError: Model architectures ['WhisperForConditionalGeneration'] are not supported for now. Supported architectures: dict_keys(['AquilaModel', 'AquilaForCausalLM', 'ArcticForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'CohereForCausalLM', 'Cohere2ForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'DeepseekV3ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GlmForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'GraniteForCausalLM', 'GraniteMoeForCausalLM', 'GritLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'InternLM2VEForCausalLM', 'JAISLMHeadModel', 'JambaForCausalLM', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MambaForCausalLM', 'FalconMambaForCausalLM', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'Olmo2ForCausalLM', 'OlmoeForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'Phi3SmallForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'SolarForCausalLM', 'TeleChat2ForCausalLM', 'XverseForCausalLM', 'BartModel', 'BartForConditionalGeneration', 'Florence2ForConditionalGeneration', 'BertModel', 'RobertaModel', 'RobertaForMaskedLM', 'XLMRobertaModel', 'Gemma2Model', 'JambaForSequenceClassification', 'LlamaModel', 'MistralModel', 'Qwen2Model', 'Qwen2ForRewardModel', 'LlavaNextForConditionalGeneration', 'Phi3VForCausalLM', 'Qwen2VLForConditionalGeneration', 'Qwen2ForSequenceClassification', 'BertForSequenceClassification', 'RobertaForSequenceClassification', 'XLMRobertaForSequenceClassification', 'AriaForConditionalGeneration', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'FuyuForCausalLM', 'H2OVLChatModel', 'InternVLChatModel', 'Idefics3ForConditionalGeneration', 'LlavaForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'LlavaOnevisionForConditionalGeneration', 'MantisForConditionalGeneration', 'MiniCPMV', 'MolmoForCausalLM', 'NVLM_D', 'PaliGemmaForConditionalGeneration', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'Qwen2AudioForConditionalGeneration', 'UltravoxModel', 'MllamaForConditionalGeneration', 'EAGLEModel', 'MedusaModel', 'MLPSpeculatorPreTrainedModel'])

pip freeze:

aiohappyeyeballs==2.4.4
aiohttp==3.11.11
aiohttp-cors==0.7.0
aiosignal==1.3.2
airportsdata==20241001
annotated-types==0.7.0
anyio==4.8.0
astor==0.8.1
async-timeout==5.0.1
attrs==24.3.0
blake3==1.0.2
cachetools==5.5.0
certifi==2024.12.14
charset-normalizer==3.4.1
click==8.1.8
cloudpickle==3.1.1
colorful==0.5.6
compressed-tensors==0.8.1
depyf==0.18.0
dill==0.3.9
diskcache==5.6.3
distlib==0.3.9
distro==1.9.0
einops==0.8.0
exceptiongroup==1.2.2
fastapi==0.115.6
filelock==3.16.1
frozenlist==1.5.0
fsspec==2024.12.0
gguf==0.10.0
google-api-core==2.24.0
google-auth==2.37.0
googleapis-common-protos==1.66.0
grpcio==1.69.0
h11==0.14.0
httpcore==1.0.7
httptools==0.6.4
httpx==0.28.1
huggingface-hub==0.27.1
idna==3.10
importlib_metadata==8.5.0
iniconfig==2.0.0
inquirerpy==0.3.4
interegular==0.3.3
Jinja2==3.1.5
jiter==0.8.2
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
lark==1.2.2
linkify-it-py==2.0.3
lm-format-enforcer==0.10.9
markdown-it-py==3.0.0
MarkupSafe==3.0.2
mdit-py-plugins==0.4.2
mdurl==0.1.2
memray==1.15.0
mistral_common==1.5.1
mpmath==1.3.0
msgpack==1.1.0
msgspec==0.19.0
multidict==6.1.0
nest-asyncio==1.6.0
networkx==3.4.2
numpy==1.26.4
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-ml-py==12.560.30
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
openai==1.59.7
opencensus==0.11.4
opencensus-context==0.1.3
opencv-python-headless==4.10.0.84
outlines==0.1.11
outlines_core==0.1.26
packaging==24.2
partial-json-parser==0.2.1.1.post5
pfzy==0.3.4
pillow==10.4.0
platformdirs==4.3.6
pluggy==1.5.0
prometheus-fastapi-instrumentator==7.0.2
prometheus_client==0.21.1
prompt_toolkit==3.0.48
propcache==0.2.1
proto-plus==1.25.0
protobuf==5.29.3
psutil==6.1.1
py-cpuinfo==9.0.0
py-spy==0.4.0
pyasn1==0.6.1
pyasn1_modules==0.4.1
pybind11==2.13.6
pycountry==24.6.1
pydantic==2.10.5
pydantic_core==2.27.2
Pygments==2.19.1
pytest==8.3.4
python-dotenv==1.0.1
PyYAML==6.0.2
pyzmq==26.2.0
ray==2.40.0
referencing==0.35.1
regex==2024.11.6
requests==2.32.3
rich==13.9.4
rpds-py==0.22.3
rsa==4.9
safetensors==0.5.2
sentencepiece==0.2.0
six==1.17.0
smart-open==7.1.0
sniffio==1.3.1
starlette==0.41.3
sympy==1.13.1
textual==1.0.0
tiktoken==0.7.0
tokenizers==0.21.0
tomli==2.2.1
torch==2.5.1
torchvision==0.20.1
tqdm==4.67.1
transformers==4.48.0
triton==3.1.0
typing_extensions==4.12.2
uc-micro-py==1.0.3
urllib3==2.3.0
uvicorn==0.34.0
uvloop==0.21.0
virtualenv==20.29.0
vllm==0.6.6.post1
watchfiles==1.0.4
wcwidth==0.2.13
websockets==14.1
wrapt==1.17.2
xformers==0.0.28.post3
xgrammar==0.1.9
yarl==1.18.3
zipp==3.21.0

@hmellor
Copy link
Collaborator

hmellor commented Jan 15, 2025

0.6.6.post1 does not support Whisper. Support was added by #11280 2 weeks ago, which is after 0.6.6.post1 was released. To use Whisper you must either install from main or wait for the next release.

@sbaby171
Copy link

i am seeing errors when trying to run the vLLM offline example of Whisper:

https://docs.vllm.ai/en/latest/getting_started/examples/whisper.html

This is the error I am seeing:

Traceback (most recent call last):
  File "/home/ubuntu/clip-and-whisper/test-whisper-offline.py", line 7, in <module>
    llm = LLM(
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/utils.py", line 986, in inner
    return fn(*args, **kwargs)
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 230, in __init__
    self.llm_engine = self.engine_class.from_engine_args(
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 514, in from_engine_args
    engine_config = engine_args.create_engine_config(usage_context)
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 1044, in create_engine_config
    model_config = self.create_model_config()
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 970, in create_model_config
    return ModelConfig(
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/config.py", line 337, in __init__
    self.multimodal_config = self._init_multimodal_config(
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/config.py", line 392, in _init_multimodal_config
    if ModelRegistry.is_multimodal_model(architectures):
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 461, in is_multimodal_model
    model_cls, _ = self.inspect_model_cls(architectures)
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 421, in inspect_model_cls
    return self._raise_for_unsupported(architectures)
  File "/mnt/storage/VENV-VLLM/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 382, in _raise_for_unsupported
    raise ValueError(
ValueError: Model architectures ['WhisperForConditionalGeneration'] are not supported for now. Supported architectures: dict_keys(['AquilaModel', 'AquilaForCausalLM', 'ArcticForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'CohereForCausalLM', 'Cohere2ForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'DeepseekV3ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GlmForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'GraniteForCausalLM', 'GraniteMoeForCausalLM', 'GritLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'InternLM2VEForCausalLM', 'JAISLMHeadModel', 'JambaForCausalLM', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MambaForCausalLM', 'FalconMambaForCausalLM', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'Olmo2ForCausalLM', 'OlmoeForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'Phi3SmallForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'SolarForCausalLM', 'TeleChat2ForCausalLM', 'XverseForCausalLM', 'BartModel', 'BartForConditionalGeneration', 'Florence2ForConditionalGeneration', 'BertModel', 'RobertaModel', 'RobertaForMaskedLM', 'XLMRobertaModel', 'Gemma2Model', 'JambaForSequenceClassification', 'LlamaModel', 'MistralModel', 'Qwen2Model', 'Qwen2ForRewardModel', 'LlavaNextForConditionalGeneration', 'Phi3VForCausalLM', 'Qwen2VLForConditionalGeneration', 'Qwen2ForSequenceClassification', 'BertForSequenceClassification', 'RobertaForSequenceClassification', 'XLMRobertaForSequenceClassification', 'AriaForConditionalGeneration', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'FuyuForCausalLM', 'H2OVLChatModel', 'InternVLChatModel', 'Idefics3ForConditionalGeneration', 'LlavaForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'LlavaOnevisionForConditionalGeneration', 'MantisForConditionalGeneration', 'MiniCPMV', 'MolmoForCausalLM', 'NVLM_D', 'PaliGemmaForConditionalGeneration', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'Qwen2AudioForConditionalGeneration', 'UltravoxModel', 'MllamaForConditionalGeneration', 'EAGLEModel', 'MedusaModel', 'MLPSpeculatorPreTrainedModel'])

pip freeze:

aiohappyeyeballs==2.4.4
aiohttp==3.11.11
aiohttp-cors==0.7.0
aiosignal==1.3.2
airportsdata==20241001
annotated-types==0.7.0
anyio==4.8.0
astor==0.8.1
async-timeout==5.0.1
attrs==24.3.0
blake3==1.0.2
cachetools==5.5.0
certifi==2024.12.14
charset-normalizer==3.4.1
click==8.1.8
cloudpickle==3.1.1
colorful==0.5.6
compressed-tensors==0.8.1
depyf==0.18.0
dill==0.3.9
diskcache==5.6.3
distlib==0.3.9
distro==1.9.0
einops==0.8.0
exceptiongroup==1.2.2
fastapi==0.115.6
filelock==3.16.1
frozenlist==1.5.0
fsspec==2024.12.0
gguf==0.10.0
google-api-core==2.24.0
google-auth==2.37.0
googleapis-common-protos==1.66.0
grpcio==1.69.0
h11==0.14.0
httpcore==1.0.7
httptools==0.6.4
httpx==0.28.1
huggingface-hub==0.27.1
idna==3.10
importlib_metadata==8.5.0
iniconfig==2.0.0
inquirerpy==0.3.4
interegular==0.3.3
Jinja2==3.1.5
jiter==0.8.2
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
lark==1.2.2
linkify-it-py==2.0.3
lm-format-enforcer==0.10.9
markdown-it-py==3.0.0
MarkupSafe==3.0.2
mdit-py-plugins==0.4.2
mdurl==0.1.2
memray==1.15.0
mistral_common==1.5.1
mpmath==1.3.0
msgpack==1.1.0
msgspec==0.19.0
multidict==6.1.0
nest-asyncio==1.6.0
networkx==3.4.2
numpy==1.26.4
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-ml-py==12.560.30
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
openai==1.59.7
opencensus==0.11.4
opencensus-context==0.1.3
opencv-python-headless==4.10.0.84
outlines==0.1.11
outlines_core==0.1.26
packaging==24.2
partial-json-parser==0.2.1.1.post5
pfzy==0.3.4
pillow==10.4.0
platformdirs==4.3.6
pluggy==1.5.0
prometheus-fastapi-instrumentator==7.0.2
prometheus_client==0.21.1
prompt_toolkit==3.0.48
propcache==0.2.1
proto-plus==1.25.0
protobuf==5.29.3
psutil==6.1.1
py-cpuinfo==9.0.0
py-spy==0.4.0
pyasn1==0.6.1
pyasn1_modules==0.4.1
pybind11==2.13.6
pycountry==24.6.1
pydantic==2.10.5
pydantic_core==2.27.2
Pygments==2.19.1
pytest==8.3.4
python-dotenv==1.0.1
PyYAML==6.0.2
pyzmq==26.2.0
ray==2.40.0
referencing==0.35.1
regex==2024.11.6
requests==2.32.3
rich==13.9.4
rpds-py==0.22.3
rsa==4.9
safetensors==0.5.2
sentencepiece==0.2.0
six==1.17.0
smart-open==7.1.0
sniffio==1.3.1
starlette==0.41.3
sympy==1.13.1
textual==1.0.0
tiktoken==0.7.0
tokenizers==0.21.0
tomli==2.2.1
torch==2.5.1
torchvision==0.20.1
tqdm==4.67.1
transformers==4.48.0
triton==3.1.0
typing_extensions==4.12.2
uc-micro-py==1.0.3
urllib3==2.3.0
uvicorn==0.34.0
uvloop==0.21.0
virtualenv==20.29.0
vllm==0.6.6.post1
watchfiles==1.0.4
wcwidth==0.2.13
websockets==14.1
wrapt==1.17.2
xformers==0.0.28.post3
xgrammar==0.1.9
yarl==1.18.3
zipp==3.21.0

FYI - i nuked my py vm, and installed the nightly:

pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly

thus,

vllm==0.6.6.post2.dev234+gebd8c669

it seems to be working now... it is currently downloading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new model Requests to new models
Projects
None yet