Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Dimension mismatch error will occur during batch inference when processing image embeddings with minicpmv #11630

Closed
whyiug opened this issue Dec 30, 2024 · 0 comments · Fixed by #11631
Labels
bug Something isn't working

Comments

@whyiug
Copy link
Contributor

whyiug commented Dec 30, 2024

Your current environment

Collecting environment information...
PyTorch version: 2.5.1+cu124
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A

OS: CentOS Linux release 7.9.2009 (Core) (x86_64)
GCC version: (conda-forge gcc 14.2.0-1) 14.2.0
Clang version: Could not collect
CMake version: version 3.26.4
Libc version: glibc-2.17

Python version: 3.10.15 (main, Oct  3 2024, 07:27:34) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-3.10.0-1160.105.1.el7.x86_64-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: 12.0.140
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB
Nvidia driver version: 545.23.08
cuDNN version: Probably one of the following:
/usr/local/cuda-12.0/targets/x86_64-linux/lib/libcudnn.so.8.9.1
/usr/local/cuda-12.0/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.9.1
/usr/local/cuda-12.0/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.9.1
/usr/local/cuda-12.0/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.9.1
/usr/local/cuda-12.0/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.9.1
/usr/local/cuda-12.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.9.1
/usr/local/cuda-12.0/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.9.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                12
On-line CPU(s) list:   0-11
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
Stepping:              7
CPU MHz:               2499.976
BogoMIPS:              4999.95
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              36608K
NUMA node0 CPU(s):     0-11
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc eagerfpu pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single rsb_ctxsw fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat avx512_vnni

Versions of relevant libraries:
[pip3] facenet-pytorch==2.6.0
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.4.5.8
[pip3] nvidia-cuda-cupti-cu12==12.4.127
[pip3] nvidia-cuda-nvrtc-cu12==12.4.127
[pip3] nvidia-cuda-runtime-cu12==12.4.127
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.2.1.3
[pip3] nvidia-curand-cu12==10.3.5.147
[pip3] nvidia-cusolver-cu12==11.6.1.9
[pip3] nvidia-cusparse-cu12==12.3.1.170
[pip3] nvidia-ml-py==12.560.30
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.4.127
[pip3] nvidia-nvtx-cu12==12.4.127
[pip3] onnxruntime-gpu==1.16.3
[pip3] open_clip_torch==2.29.0
[pip3] pyzmq==26.2.0
[pip3] torch==2.5.1+cu124
[pip3] torchao==0.8.0.dev20241203+cu124
[pip3] torchaudio==2.4.0
[pip3] torchdiffeq==0.2.5
[pip3] torchmetrics==1.6.0
[pip3] torchsde==0.2.6
[pip3] torchtyping==0.1.5
[pip3] torchvision==0.20.1+cu124
[pip3] transformers==4.46.3
[pip3] triton==3.1.0
[conda] blas                      1.0                         mkl  
[conda] cuda-cudart               11.8.89                       0    nvidia
[conda] cuda-cupti                11.8.87                       0    nvidia
[conda] cuda-libraries            11.8.0                        0    nvidia
[conda] cuda-nvrtc                11.8.89                       0    nvidia
[conda] cuda-nvtx                 11.8.86                       0    nvidia
[conda] cuda-runtime              11.8.0                        0    nvidia
[conda] cuda-version              12.6                          3    nvidia
[conda] facenet-pytorch           2.6.0                    pypi_0    pypi
[conda] libcublas                 11.11.3.6                     0    nvidia
[conda] libcufft                  10.9.0.58                     0    nvidia
[conda] libcufile                 1.11.1.6                      0    nvidia
[conda] libcurand                 10.3.7.77                     0    nvidia
[conda] libcusolver               11.4.1.48                     0    nvidia
[conda] libcusparse               11.7.5.86                     0    nvidia
[conda] libjpeg-turbo             2.0.0                h9bf148f_0    pytorch
[conda] libnpp                    11.8.0.86                     0    nvidia
[conda] libnvjpeg                 11.9.0.86                     0    nvidia
[conda] mkl                       2023.1.0         h213fc3f_46344  
[conda] mkl-service               2.4.0           py310h5eee18b_1  
[conda] mkl_fft                   1.3.11          py310h5eee18b_0  
[conda] mkl_random                1.2.8           py310h1128e8f_0  
[conda] numpy                     1.26.4          py310h5f9d8c6_0  
[conda] numpy-base                1.26.4          py310hb5e798b_0  
[conda] nvidia-cublas-cu12        12.4.5.8                 pypi_0    pypi
[conda] nvidia-cuda-cupti-cu12    12.4.127                 pypi_0    pypi
[conda] nvidia-cuda-nvrtc-cu12    12.4.127                 pypi_0    pypi
[conda] nvidia-cuda-runtime-cu12  12.4.127                 pypi_0    pypi
[conda] nvidia-cudnn-cu12         9.1.0.70                 pypi_0    pypi
[conda] nvidia-cufft-cu12         11.2.1.3                 pypi_0    pypi
[conda] nvidia-curand-cu12        10.3.5.147               pypi_0    pypi
[conda] nvidia-cusolver-cu12      11.6.1.9                 pypi_0    pypi
[conda] nvidia-cusparse-cu12      12.3.1.170               pypi_0    pypi
[conda] nvidia-ml-py              12.560.30                pypi_0    pypi
[conda] nvidia-nccl-cu12          2.21.5                   pypi_0    pypi
[conda] nvidia-nvjitlink-cu12     12.4.127                 pypi_0    pypi
[conda] nvidia-nvtx-cu12          12.4.127                 pypi_0    pypi
[conda] open-clip-torch           2.29.0                   pypi_0    pypi
[conda] pytorch-cuda              11.8                 h7e8668a_6    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] pyzmq                     26.2.0                   pypi_0    pypi
[conda] torch                     2.5.1+cu124              pypi_0    pypi
[conda] torchao                   0.8.0.dev20241203+cu124          pypi_0    pypi
[conda] torchaudio                2.5.1+cu124              pypi_0    pypi
[conda] torchdiffeq               0.2.5                    pypi_0    pypi
[conda] torchmetrics              1.6.0                    pypi_0    pypi
[conda] torchsde                  0.2.6                    pypi_0    pypi
[conda] torchtyping               0.1.5                    pypi_0    pypi
[conda] torchvision               0.20.1                   pypi_0    pypi
[conda] transformers              4.46.3                   pypi_0    pypi
[conda] triton                    3.1.0                    pypi_0    pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: N/A
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      0-11    0               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

Model Input Dumps

No response

🐛 Describe the bug

import requests
import torch
from PIL import Image
from transformers import AutoTokenizer

from vllm import LLM, SamplingParams
from PIL import Image
import requests

question = "请分别描述这几张图片。"
image_url1 = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"

image_url2 = "https://pics1.baidu.com/feed/7acb0a46f21fbe094f59a18fb5ffe03d8644ad50.jpeg@f_auto?token=6ad879f34822f7617ef2834c83a2e017"


model_id = "/home/work/forrest/github/MiniCPM-V/saved_models/base"


tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
llm = LLM(
    model=model_id,
    trust_remote_code=True,
    max_model_len=2048,
    gpu_memory_utilization=0.9,
    max_num_seqs=5,
    dtype="auto",
)

messages = [{"role": "user", "content": f"(<image>./</image>)\n{question}"}]
prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
stop_tokens = ["<|im_end|>", "<|endoftext|>"]
stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens]
sampling_params = SamplingParams(
    stop_token_ids=stop_token_ids,
    max_tokens=1000,
    temperature=0,
    best_of=1,
)

def test_mix_image_input():
    mm_data1 = {
        "image_embeds": torch.randn(9, 64, 3584, dtype=torch.bfloat16),
        "image_size_list": [
            Image.open(requests.get(image_url1, stream=True).raw).convert("RGB").size
        ],
    }
    mm_data2 = {
        "image_embeds": torch.randn(3, 64, 3584, dtype=torch.bfloat16),
        "image_size_list": [
            Image.open(requests.get(image_url2, stream=True).raw).convert("RGB").size
        ],
    }
    llm_inputs1 = {"prompt": prompt, "multi_modal_data": {"image": mm_data1}}
    llm_inputs2 = {"prompt": prompt, "multi_modal_data": {"image": mm_data2}}
    outputs = llm.generate(
        [llm_inputs1, llm_inputs2],
        sampling_params=sampling_params,
    )
    print(outputs[0].outputs[0].text)
    print(outputs[1].outputs[0].text)
    return outputs


if __name__ == "__main__":
    test_mix_image_input()

This code will cause an error.

NFO 12-30 21:36:20 model_runner_base.py:149] Completed writing input of failed execution to /tmp/err_execute_model_input_20241230-213620.pkl.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/vllm/worker/model_runner_base.py", line 116, in _wrapper
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1654, in execute_model
[rank0]:     hidden_or_intermediate_states = model_executable(
[rank0]:   File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/vllm/model_executor/models/minicpmv.py", line 571, in forward
[rank0]:     vlm_embeddings, _ = self.get_embedding(input_ids, image_inputs)
[rank0]:   File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/vllm/model_executor/models/minicpmv.py", line 439, in get_embedding
[rank0]:     vision_hidden_states = (image_inputs["data"].type(
[rank0]: AttributeError: 'list' object has no attribute 'type'

[rank0]: The above exception was the direct cause of the following exception:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/work/forrest/github/dl_exp/mllm/scripts/minicpmv/example_minicpmv_vllm_demo_multi_pr.py", line 67, in <module>
[rank0]:     test_mix_image_input()
[rank0]:   File "/home/work/forrest/github/dl_exp/mllm/scripts/minicpmv/example_minicpmv_vllm_demo_multi_pr.py", line 57, in test_mix_image_input
[rank0]:     outputs = llm.generate(
[rank0]:   File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/vllm/utils.py", line 1063, in inner
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 406, in generate
[rank0]:     outputs = self._run_engine(use_tqdm=use_tqdm)
[rank0]:   File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 942, in _run_engine
[rank0]:     step_outputs = self.llm_engine.step()
[rank0]:   File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 1454, in step
[rank0]:     outputs = self.model_executor.execute_model(
[rank0]:   File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 125, in execute_model
[rank0]:     output = self.driver_worker.execute_model(execute_model_req)
[rank0]:   File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 343, in execute_model
[rank0]:     output = self.model_runner.execute_model(
[rank0]:   File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/vllm/worker/model_runner_base.py", line 152, in _wrapper
[rank0]:     raise type(err)(
[rank0]: AttributeError: Error in model execution (input dumped to /tmp/err_execute_model_input_20241230-213620.pkl): 'list' object has no attribute 'type'
Processed prompts:   0%|        | 0/2 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]```

### Before submitting a new issue...

- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant