Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Model]: nvidia/Hymba-1.5B-Base #10783

Open
1 task done
hutm opened this issue Nov 29, 2024 · 6 comments
Open
1 task done

[New Model]: nvidia/Hymba-1.5B-Base #10783

hutm opened this issue Nov 29, 2024 · 6 comments
Labels
new model Requests to new models

Comments

@hutm
Copy link

hutm commented Nov 29, 2024

The model to consider.

https://huggingface.co/nvidia/Hymba-1.5B-Base
https://huggingface.co/nvidia/Hymba-1.5B-Instruct

The closest model vllm already supports.

https://huggingface.co/docs/transformers/main/en/model_doc/mamba
https://huggingface.co/ai21labs/AI21-Jamba-1.5-Mini

What's your difficulty of supporting the model you want?

  • support for mamba and attention heads in mixed architecture
  • attention for both sliding window and meta (memory) tokens (a variation of blocksparse attention might be leveraged)

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@hutm hutm added the new model Requests to new models label Nov 29, 2024
@llv22
Copy link

llv22 commented Dec 14, 2024

@hutm so nvidia/Hymba-1.5B-Instruct hasn't been supported by vllm? what is your release plan for this? I run

vllm serve nvidia/Hymba-1.5B-Instruct --host 0.0.0.0  --port 8002 --dtype auto --trust-remote-code

but got

ERROR 12-14 11:45:17 engine.py:366] Traceback (most recent call last):
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine
ERROR 12-14 11:45:17 engine.py:366]     engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 114, in from_engine_args
ERROR 12-14 11:45:17 engine.py:366]     engine_config = engine_args.create_engine_config()
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 959, in create_engine_config
ERROR 12-14 11:45:17 engine.py:366]     model_config = self.create_model_config()
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 891, in create_model_config
ERROR 12-14 11:45:17 engine.py:366]     return ModelConfig(
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/config.py", line 251, in __init__
ERROR 12-14 11:45:17 engine.py:366]     self.multimodal_config = self._init_multimodal_config(
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/config.py", line 277, in _init_multimodal_config
ERROR 12-14 11:45:17 engine.py:366]     if ModelRegistry.is_multimodal_model(architectures):
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 422, in is_multimodal_model
ERROR 12-14 11:45:17 engine.py:366]     return self.inspect_model_cls(architectures).supports_multimodal
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 391, in inspect_model_cls
ERROR 12-14 11:45:17 engine.py:366]     return self._raise_for_unsupported(architectures)
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 352, in _raise_for_unsupported
ERROR 12-14 11:45:17 engine.py:366]     raise ValueError(
ERROR 12-14 11:45:17 engine.py:366] ValueError: Model architectures ['HymbaForCausalLM'] are not supported for now. Supported architectures: dict_keys(['AquilaModel', 'AquilaForCausalLM', 'ArcticForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'GraniteForCausalLM', 'GraniteMoeForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'InternLM2VEForCausalLM', 'JAISLMHeadModel', 'JambaForCausalLM', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MambaForCausalLM', 'FalconMambaForCausalLM', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OlmoeForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'Phi3SmallForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'SolarForCausalLM', 'XverseForCausalLM', 'BartModel', 'BartForConditionalGeneration', 'Florence2ForConditionalGeneration', 'BertModel', 'RobertaModel', 'XLMRobertaModel', 'Gemma2Model', 'LlamaModel', 'MistralModel', 'Qwen2Model', 'Qwen2ForRewardModel', 'Qwen2ForSequenceClassification', 'LlavaNextForConditionalGeneration', 'Phi3VForCausalLM', 'Qwen2VLForConditionalGeneration', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'FuyuForCausalLM', 'H2OVLChatModel', 'InternVLChatModel', 'Idefics3ForConditionalGeneration', 'LlavaForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'LlavaOnevisionForConditionalGeneration', 'MiniCPMV', 'MolmoForCausalLM', 'NVLM_D', 'PaliGemmaForConditionalGeneration', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'Qwen2AudioForConditionalGeneration', 'UltravoxModel', 'MllamaForConditionalGeneration', 'EAGLEModel', 'MedusaModel', 'MLPSpeculatorPreTrainedModel'])
Process SpawnProcess-1:
Traceback (most recent call last):
  File "/usr/local/anaconda3/envs/agent-workflow-memory/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/local/anaconda3/envs/agent-workflow-memory/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 368, in run_mp_engine
    raise e
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine
    engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 114, in from_engine_args
    engine_config = engine_args.create_engine_config()
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 959, in create_engine_config
    model_config = self.create_model_config()
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 891, in create_model_config
    return ModelConfig(
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/config.py", line 251, in __init__
    self.multimodal_config = self._init_multimodal_config(
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/config.py", line 277, in _init_multimodal_config
    if ModelRegistry.is_multimodal_model(architectures):
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 422, in is_multimodal_model
    return self.inspect_model_cls(architectures).supports_multimodal
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 391, in inspect_model_cls
    return self._raise_for_unsupported(architectures)
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 352, in _raise_for_unsupported
    raise ValueError(
ValueError: Model architectures ['HymbaForCausalLM'] are not supported for now. Supported architectures: dict_keys(['AquilaModel', 'AquilaForCausalLM', 'ArcticForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'GraniteForCausalLM', 'GraniteMoeForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'InternLM2VEForCausalLM', 'JAISLMHeadModel', 'JambaForCausalLM', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MambaForCausalLM', 'FalconMambaForCausalLM', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OlmoeForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'Phi3SmallForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'SolarForCausalLM', 'XverseForCausalLM', 'BartModel', 'BartForConditionalGeneration', 'Florence2ForConditionalGeneration', 'BertModel', 'RobertaModel', 'XLMRobertaModel', 'Gemma2Model', 'LlamaModel', 'MistralModel', 'Qwen2Model', 'Qwen2ForRewardModel', 'Qwen2ForSequenceClassification', 'LlavaNextForConditionalGeneration', 'Phi3VForCausalLM', 'Qwen2VLForConditionalGeneration', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'FuyuForCausalLM', 'H2OVLChatModel', 'InternVLChatModel', 'Idefics3ForConditionalGeneration', 'LlavaForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'LlavaOnevisionForConditionalGeneration', 'MiniCPMV', 'MolmoForCausalLM', 'NVLM_D', 'PaliGemmaForConditionalGeneration', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'Qwen2AudioForConditionalGeneration', 'UltravoxModel', 'MllamaForConditionalGeneration', 'EAGLEModel', 'MedusaModel', 'MLPSpeculatorPreTrainedModel'])

@johnnynunez
Copy link

@hutm so nvidia/Hymba-1.5B-Instruct hasn't been supported by vllm? what is your release plan for this? I run

vllm serve nvidia/Hymba-1.5B-Instruct --host 0.0.0.0  --port 8002 --dtype auto --trust-remote-code

but got

ERROR 12-14 11:45:17 engine.py:366] Traceback (most recent call last):
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine
ERROR 12-14 11:45:17 engine.py:366]     engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 114, in from_engine_args
ERROR 12-14 11:45:17 engine.py:366]     engine_config = engine_args.create_engine_config()
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 959, in create_engine_config
ERROR 12-14 11:45:17 engine.py:366]     model_config = self.create_model_config()
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 891, in create_model_config
ERROR 12-14 11:45:17 engine.py:366]     return ModelConfig(
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/config.py", line 251, in __init__
ERROR 12-14 11:45:17 engine.py:366]     self.multimodal_config = self._init_multimodal_config(
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/config.py", line 277, in _init_multimodal_config
ERROR 12-14 11:45:17 engine.py:366]     if ModelRegistry.is_multimodal_model(architectures):
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 422, in is_multimodal_model
ERROR 12-14 11:45:17 engine.py:366]     return self.inspect_model_cls(architectures).supports_multimodal
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 391, in inspect_model_cls
ERROR 12-14 11:45:17 engine.py:366]     return self._raise_for_unsupported(architectures)
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 352, in _raise_for_unsupported
ERROR 12-14 11:45:17 engine.py:366]     raise ValueError(
ERROR 12-14 11:45:17 engine.py:366] ValueError: Model architectures ['HymbaForCausalLM'] are not supported for now. Supported architectures: dict_keys(['AquilaModel', 'AquilaForCausalLM', 'ArcticForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'GraniteForCausalLM', 'GraniteMoeForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'InternLM2VEForCausalLM', 'JAISLMHeadModel', 'JambaForCausalLM', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MambaForCausalLM', 'FalconMambaForCausalLM', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OlmoeForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'Phi3SmallForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'SolarForCausalLM', 'XverseForCausalLM', 'BartModel', 'BartForConditionalGeneration', 'Florence2ForConditionalGeneration', 'BertModel', 'RobertaModel', 'XLMRobertaModel', 'Gemma2Model', 'LlamaModel', 'MistralModel', 'Qwen2Model', 'Qwen2ForRewardModel', 'Qwen2ForSequenceClassification', 'LlavaNextForConditionalGeneration', 'Phi3VForCausalLM', 'Qwen2VLForConditionalGeneration', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'FuyuForCausalLM', 'H2OVLChatModel', 'InternVLChatModel', 'Idefics3ForConditionalGeneration', 'LlavaForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'LlavaOnevisionForConditionalGeneration', 'MiniCPMV', 'MolmoForCausalLM', 'NVLM_D', 'PaliGemmaForConditionalGeneration', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'Qwen2AudioForConditionalGeneration', 'UltravoxModel', 'MllamaForConditionalGeneration', 'EAGLEModel', 'MedusaModel', 'MLPSpeculatorPreTrainedModel'])
Process SpawnProcess-1:
Traceback (most recent call last):
  File "/usr/local/anaconda3/envs/agent-workflow-memory/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/local/anaconda3/envs/agent-workflow-memory/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 368, in run_mp_engine
    raise e
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine
    engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 114, in from_engine_args
    engine_config = engine_args.create_engine_config()
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 959, in create_engine_config
    model_config = self.create_model_config()
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 891, in create_model_config
    return ModelConfig(
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/config.py", line 251, in __init__
    self.multimodal_config = self._init_multimodal_config(
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/config.py", line 277, in _init_multimodal_config
    if ModelRegistry.is_multimodal_model(architectures):
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 422, in is_multimodal_model
    return self.inspect_model_cls(architectures).supports_multimodal
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 391, in inspect_model_cls
    return self._raise_for_unsupported(architectures)
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 352, in _raise_for_unsupported
    raise ValueError(
ValueError: Model architectures ['HymbaForCausalLM'] are not supported for now. Supported architectures: dict_keys(['AquilaModel', 'AquilaForCausalLM', 'ArcticForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'GraniteForCausalLM', 'GraniteMoeForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'InternLM2VEForCausalLM', 'JAISLMHeadModel', 'JambaForCausalLM', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MambaForCausalLM', 'FalconMambaForCausalLM', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OlmoeForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'Phi3SmallForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'SolarForCausalLM', 'XverseForCausalLM', 'BartModel', 'BartForConditionalGeneration', 'Florence2ForConditionalGeneration', 'BertModel', 'RobertaModel', 'XLMRobertaModel', 'Gemma2Model', 'LlamaModel', 'MistralModel', 'Qwen2Model', 'Qwen2ForRewardModel', 'Qwen2ForSequenceClassification', 'LlavaNextForConditionalGeneration', 'Phi3VForCausalLM', 'Qwen2VLForConditionalGeneration', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'FuyuForCausalLM', 'H2OVLChatModel', 'InternVLChatModel', 'Idefics3ForConditionalGeneration', 'LlavaForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'LlavaOnevisionForConditionalGeneration', 'MiniCPMV', 'MolmoForCausalLM', 'NVLM_D', 'PaliGemmaForConditionalGeneration', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'Qwen2AudioForConditionalGeneration', 'UltravoxModel', 'MllamaForConditionalGeneration', 'EAGLEModel', 'MedusaModel', 'MLPSpeculatorPreTrainedModel'])

requires flexAttention from pytorch 2.6.0 and above

@llv22
Copy link

llv22 commented Dec 20, 2024

@hutm so nvidia/Hymba-1.5B-Instruct hasn't been supported by vllm? what is your release plan for this? I run

vllm serve nvidia/Hymba-1.5B-Instruct --host 0.0.0.0  --port 8002 --dtype auto --trust-remote-code

but got

ERROR 12-14 11:45:17 engine.py:366] Traceback (most recent call last):
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine
ERROR 12-14 11:45:17 engine.py:366]     engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 114, in from_engine_args
ERROR 12-14 11:45:17 engine.py:366]     engine_config = engine_args.create_engine_config()
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 959, in create_engine_config
ERROR 12-14 11:45:17 engine.py:366]     model_config = self.create_model_config()
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 891, in create_model_config
ERROR 12-14 11:45:17 engine.py:366]     return ModelConfig(
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/config.py", line 251, in __init__
ERROR 12-14 11:45:17 engine.py:366]     self.multimodal_config = self._init_multimodal_config(
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/config.py", line 277, in _init_multimodal_config
ERROR 12-14 11:45:17 engine.py:366]     if ModelRegistry.is_multimodal_model(architectures):
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 422, in is_multimodal_model
ERROR 12-14 11:45:17 engine.py:366]     return self.inspect_model_cls(architectures).supports_multimodal
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 391, in inspect_model_cls
ERROR 12-14 11:45:17 engine.py:366]     return self._raise_for_unsupported(architectures)
ERROR 12-14 11:45:17 engine.py:366]   File "/data/orlando/.local/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 352, in _raise_for_unsupported
ERROR 12-14 11:45:17 engine.py:366]     raise ValueError(
ERROR 12-14 11:45:17 engine.py:366] ValueError: Model architectures ['HymbaForCausalLM'] are not supported for now. Supported architectures: dict_keys(['AquilaModel', 'AquilaForCausalLM', 'ArcticForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'GraniteForCausalLM', 'GraniteMoeForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'InternLM2VEForCausalLM', 'JAISLMHeadModel', 'JambaForCausalLM', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MambaForCausalLM', 'FalconMambaForCausalLM', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OlmoeForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'Phi3SmallForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'SolarForCausalLM', 'XverseForCausalLM', 'BartModel', 'BartForConditionalGeneration', 'Florence2ForConditionalGeneration', 'BertModel', 'RobertaModel', 'XLMRobertaModel', 'Gemma2Model', 'LlamaModel', 'MistralModel', 'Qwen2Model', 'Qwen2ForRewardModel', 'Qwen2ForSequenceClassification', 'LlavaNextForConditionalGeneration', 'Phi3VForCausalLM', 'Qwen2VLForConditionalGeneration', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'FuyuForCausalLM', 'H2OVLChatModel', 'InternVLChatModel', 'Idefics3ForConditionalGeneration', 'LlavaForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'LlavaOnevisionForConditionalGeneration', 'MiniCPMV', 'MolmoForCausalLM', 'NVLM_D', 'PaliGemmaForConditionalGeneration', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'Qwen2AudioForConditionalGeneration', 'UltravoxModel', 'MllamaForConditionalGeneration', 'EAGLEModel', 'MedusaModel', 'MLPSpeculatorPreTrainedModel'])
Process SpawnProcess-1:
Traceback (most recent call last):
  File "/usr/local/anaconda3/envs/agent-workflow-memory/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/local/anaconda3/envs/agent-workflow-memory/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 368, in run_mp_engine
    raise e
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine
    engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 114, in from_engine_args
    engine_config = engine_args.create_engine_config()
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 959, in create_engine_config
    model_config = self.create_model_config()
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 891, in create_model_config
    return ModelConfig(
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/config.py", line 251, in __init__
    self.multimodal_config = self._init_multimodal_config(
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/config.py", line 277, in _init_multimodal_config
    if ModelRegistry.is_multimodal_model(architectures):
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 422, in is_multimodal_model
    return self.inspect_model_cls(architectures).supports_multimodal
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 391, in inspect_model_cls
    return self._raise_for_unsupported(architectures)
  File "/data/orlando/.local/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 352, in _raise_for_unsupported
    raise ValueError(
ValueError: Model architectures ['HymbaForCausalLM'] are not supported for now. Supported architectures: dict_keys(['AquilaModel', 'AquilaForCausalLM', 'ArcticForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'GraniteForCausalLM', 'GraniteMoeForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'InternLM2VEForCausalLM', 'JAISLMHeadModel', 'JambaForCausalLM', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MambaForCausalLM', 'FalconMambaForCausalLM', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OlmoeForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'Phi3SmallForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'SolarForCausalLM', 'XverseForCausalLM', 'BartModel', 'BartForConditionalGeneration', 'Florence2ForConditionalGeneration', 'BertModel', 'RobertaModel', 'XLMRobertaModel', 'Gemma2Model', 'LlamaModel', 'MistralModel', 'Qwen2Model', 'Qwen2ForRewardModel', 'Qwen2ForSequenceClassification', 'LlavaNextForConditionalGeneration', 'Phi3VForCausalLM', 'Qwen2VLForConditionalGeneration', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'FuyuForCausalLM', 'H2OVLChatModel', 'InternVLChatModel', 'Idefics3ForConditionalGeneration', 'LlavaForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'LlavaOnevisionForConditionalGeneration', 'MiniCPMV', 'MolmoForCausalLM', 'NVLM_D', 'PaliGemmaForConditionalGeneration', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'Qwen2AudioForConditionalGeneration', 'UltravoxModel', 'MllamaForConditionalGeneration', 'EAGLEModel', 'MedusaModel', 'MLPSpeculatorPreTrainedModel'])

requires flexAttention from pytorch 2.6.0 and above

so HymbaForCausalLM does support now?

@johnnynunez
Copy link

HymbaForCausalLM

HymbaForCausalLM still not exists

@llv22
Copy link

llv22 commented Dec 20, 2024

HymbaForCausalLM

HymbaForCausalLM still not exists

Then waiting for your integration. Thanks for clarification.

@johnnynunez
Copy link

HymbaForCausalLM

HymbaForCausalLM still not exists

Then waiting for your integration. Thanks for clarification.

you can use it manually:

repo_name = "nvidia/Hymba-1.5B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(repo_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(repo_name, trust_remote_code=True)
model = model.cuda().to(torch.bfloat16)
def chat_with_model(messages, model, tokenizer, max_new_tokens=256):
    tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to('cuda')
    stopping_criteria = StoppingCriteriaList([StopStringCriteria(tokenizer=tokenizer, stop_strings="</s>")])
    outputs = model.generate(
        tokenized_chat, 
        max_new_tokens=max_new_tokens,
        do_sample=False,
        temperature=0.7,
        use_cache=True,
        stopping_criteria=stopping_criteria
    )
    input_length = tokenized_chat.shape[1]
    response = tokenizer.decode(outputs[0][input_length:], skip_special_tokens=True)
    return response
messages = [
    {"role": "system", "content": "You are a helpful assistant."}
]
print("Chat with the model (type 'exit' to quit):")
while True:
    print("User:")
    prompt = input()
    if prompt.lower() == "exit":
        break
    
    messages.append({"role": "user", "content": prompt})
    response = chat_with_model(messages, model, tokenizer)
    messages.append({"role": "assistant", "content": response})
    
    print(f"Model: {response}")

But requires last version of flash-attention, flex-attention and last version of casual-conv1d and mamba

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new model Requests to new models
Projects
None yet
Development

No branches or pull requests

3 participants