Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Medusa example fails with vicuna 33B #2478

Open
SoundProvider opened this issue Nov 21, 2024 · 2 comments
Open

[bug] Medusa example fails with vicuna 33B #2478

SoundProvider opened this issue Nov 21, 2024 · 2 comments

Comments

@SoundProvider
Copy link

SoundProvider commented Nov 21, 2024

Thank you for developing trt-llm. It's helping me a lot
I'm trying to use medusa with trt-llm, referencing this page

It's working fine with vicuna 7B and its medusa heads, with no errors at all.

However, when implementing with vicuna 33B and its trained heads, the following error occurs when executing trtllm-build
converting checkpoint with medusa was done with following result
Image

## running script
CUDA_VISIBLE_DEVICES=${DEVICES} \
trtllm-build --checkpoint_dir /app/medusa_test/tensorrt/${TP_SIZE}-gpu \
             --gpt_attention_plugin float16 \
             --gemm_plugin float16 \
             --context_fmha enable \
             --output_dir /app/medusa_test/tensorrt_llm/${TP_SIZE}-gpu \
             --speculative_decoding_mode medusa \
             --max_batch_size ${BATCH_SIZE} \
             --max_input_len ${SEQ_LEN} \
             --max_seq_len ${SEQ_LEN} \
             --max_num_tokens ${SEQ_LEN} \
             --workers ${TP_SIZE} 
concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/usr/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'MedusaConfig.__init__.<locals>.GenericMedusaConfig'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 437, in parallel_build
    future.result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/usr/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'MedusaConfig.__init__.<locals>.GenericMedusaConfig'
@SoundProvider SoundProvider changed the title Medusa example with vicuna 33B Medusa example fails with vicuna 33B Nov 22, 2024
@SoundProvider SoundProvider changed the title Medusa example fails with vicuna 33B [bug] Medusa example fails with vicuna 33B Nov 22, 2024
@hello-11
Copy link
Collaborator

@SoundProvider, could you also show the command to convert the checkpoint?

@SoundProvider
Copy link
Author

DEVICES=0,1,2,3
TP_SIZE=4
BATCH_SIZE=4


CUDA_VISIBLE_DEVICES=${DEVICES} \
python /app/tensorrt_llm/examples/medusa/convert_checkpoint.py \
                            --model_dir /app/models/vicuna-33b-v1.3 \
                            --medusa_model_dir /app/models/medusa-vicuna-33b-v1.3 \
                            --output_dir /app/models/medusa_test/tensorrt/${TP_SIZE}-gpu \
                            --dtype float16 \
                            --num_medusa_heads 4 \
                            --tp_size ${TP_SIZE} 


CUDA_VISIBLE_DEVICES=${DEVICES} \
trtllm-build --checkpoint_dir /app/models/medusa_test/tensorrt/${TP_SIZE}-gpu \
             --gpt_attention_plugin float16 \
             --gemm_plugin float16 \
             --context_fmha enable \
             --output_dir /app/models/medusa_test/tensorrt_llm/${TP_SIZE}-gpu \
             --speculative_decoding_mode medusa \
             --max_batch_size ${BATCH_SIZE} \
             --workers ${TP_SIZE} 

@hello-11 I use the medusa example here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants