-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Model]: nvidia/Hymba-1.5B-Base #10783
Comments
@hutm so nvidia/Hymba-1.5B-Instruct hasn't been supported by vllm? what is your release plan for this? I run vllm serve nvidia/Hymba-1.5B-Instruct --host 0.0.0.0 --port 8002 --dtype auto --trust-remote-code but got ERROR 12-14 11:45:17 engine.py:366] Traceback (most recent call last):
ERROR 12-14 11:45:17 engine.py:366] File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine
ERROR 12-14 11:45:17 engine.py:366] engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
ERROR 12-14 11:45:17 engine.py:366] File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 114, in from_engine_args
ERROR 12-14 11:45:17 engine.py:366] engine_config = engine_args.create_engine_config()
ERROR 12-14 11:45:17 engine.py:366] File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 959, in create_engine_config
ERROR 12-14 11:45:17 engine.py:366] model_config = self.create_model_config()
ERROR 12-14 11:45:17 engine.py:366] File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 891, in create_model_config
ERROR 12-14 11:45:17 engine.py:366] return ModelConfig(
ERROR 12-14 11:45:17 engine.py:366] File "/data/orlando/.local/lib/python3.10/site-packages/vllm/config.py", line 251, in __init__
ERROR 12-14 11:45:17 engine.py:366] self.multimodal_config = self._init_multimodal_config(
ERROR 12-14 11:45:17 engine.py:366] File "/data/orlando/.local/lib/python3.10/site-packages/vllm/config.py", line 277, in _init_multimodal_config
ERROR 12-14 11:45:17 engine.py:366] if ModelRegistry.is_multimodal_model(architectures):
ERROR 12-14 11:45:17 engine.py:366] File "/data/orlando/.local/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 422, in is_multimodal_model
ERROR 12-14 11:45:17 engine.py:366] return self.inspect_model_cls(architectures).supports_multimodal
ERROR 12-14 11:45:17 engine.py:366] File "/data/orlando/.local/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 391, in inspect_model_cls
ERROR 12-14 11:45:17 engine.py:366] return self._raise_for_unsupported(architectures)
ERROR 12-14 11:45:17 engine.py:366] File "/data/orlando/.local/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 352, in _raise_for_unsupported
ERROR 12-14 11:45:17 engine.py:366] raise ValueError(
ERROR 12-14 11:45:17 engine.py:366] ValueError: Model architectures ['HymbaForCausalLM'] are not supported for now. Supported architectures: dict_keys(['AquilaModel', 'AquilaForCausalLM', 'ArcticForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'GraniteForCausalLM', 'GraniteMoeForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'InternLM2VEForCausalLM', 'JAISLMHeadModel', 'JambaForCausalLM', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MambaForCausalLM', 'FalconMambaForCausalLM', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OlmoeForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'Phi3SmallForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'SolarForCausalLM', 'XverseForCausalLM', 'BartModel', 'BartForConditionalGeneration', 'Florence2ForConditionalGeneration', 'BertModel', 'RobertaModel', 'XLMRobertaModel', 'Gemma2Model', 'LlamaModel', 'MistralModel', 'Qwen2Model', 'Qwen2ForRewardModel', 'Qwen2ForSequenceClassification', 'LlavaNextForConditionalGeneration', 'Phi3VForCausalLM', 'Qwen2VLForConditionalGeneration', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'FuyuForCausalLM', 'H2OVLChatModel', 'InternVLChatModel', 'Idefics3ForConditionalGeneration', 'LlavaForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'LlavaOnevisionForConditionalGeneration', 'MiniCPMV', 'MolmoForCausalLM', 'NVLM_D', 'PaliGemmaForConditionalGeneration', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'Qwen2AudioForConditionalGeneration', 'UltravoxModel', 'MllamaForConditionalGeneration', 'EAGLEModel', 'MedusaModel', 'MLPSpeculatorPreTrainedModel'])
Process SpawnProcess-1:
Traceback (most recent call last):
File "/usr/local/anaconda3/envs/agent-workflow-memory/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/local/anaconda3/envs/agent-workflow-memory/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 368, in run_mp_engine
raise e
File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine
engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 114, in from_engine_args
engine_config = engine_args.create_engine_config()
File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 959, in create_engine_config
model_config = self.create_model_config()
File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 891, in create_model_config
return ModelConfig(
File "/data/orlando/.local/lib/python3.10/site-packages/vllm/config.py", line 251, in __init__
self.multimodal_config = self._init_multimodal_config(
File "/data/orlando/.local/lib/python3.10/site-packages/vllm/config.py", line 277, in _init_multimodal_config
if ModelRegistry.is_multimodal_model(architectures):
File "/data/orlando/.local/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 422, in is_multimodal_model
return self.inspect_model_cls(architectures).supports_multimodal
File "/data/orlando/.local/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 391, in inspect_model_cls
return self._raise_for_unsupported(architectures)
File "/data/orlando/.local/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 352, in _raise_for_unsupported
raise ValueError(
ValueError: Model architectures ['HymbaForCausalLM'] are not supported for now. Supported architectures: dict_keys(['AquilaModel', 'AquilaForCausalLM', 'ArcticForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'GraniteForCausalLM', 'GraniteMoeForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'InternLM2VEForCausalLM', 'JAISLMHeadModel', 'JambaForCausalLM', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MambaForCausalLM', 'FalconMambaForCausalLM', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OlmoeForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'Phi3SmallForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'SolarForCausalLM', 'XverseForCausalLM', 'BartModel', 'BartForConditionalGeneration', 'Florence2ForConditionalGeneration', 'BertModel', 'RobertaModel', 'XLMRobertaModel', 'Gemma2Model', 'LlamaModel', 'MistralModel', 'Qwen2Model', 'Qwen2ForRewardModel', 'Qwen2ForSequenceClassification', 'LlavaNextForConditionalGeneration', 'Phi3VForCausalLM', 'Qwen2VLForConditionalGeneration', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'FuyuForCausalLM', 'H2OVLChatModel', 'InternVLChatModel', 'Idefics3ForConditionalGeneration', 'LlavaForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'LlavaOnevisionForConditionalGeneration', 'MiniCPMV', 'MolmoForCausalLM', 'NVLM_D', 'PaliGemmaForConditionalGeneration', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'Qwen2AudioForConditionalGeneration', 'UltravoxModel', 'MllamaForConditionalGeneration', 'EAGLEModel', 'MedusaModel', 'MLPSpeculatorPreTrainedModel']) |
requires flexAttention from pytorch 2.6.0 and above |
so HymbaForCausalLM does support now? |
HymbaForCausalLM still not exists |
Then waiting for your integration. Thanks for clarification. |
you can use it manually:
But requires last version of flash-attention, flex-attention and last version of casual-conv1d and mamba |
The model to consider.
https://huggingface.co/nvidia/Hymba-1.5B-Base
https://huggingface.co/nvidia/Hymba-1.5B-Instruct
The closest model vllm already supports.
https://huggingface.co/docs/transformers/main/en/model_doc/mamba
https://huggingface.co/ai21labs/AI21-Jamba-1.5-Mini
What's your difficulty of supporting the model you want?
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: