Adding distil-whisper model support to TensorRT-LLM #1061

Bhuvanesh09 · 2024-02-07T03:29:00Z

What this PR Does:

This PR adds support for converting huggingface's distil-whisper model weights to compatible pytorch .pt files which can be further used to build TensorRT-LLM engines.

How the PR Does it:

distil-whisper's architecture is quite similar to openai-whisper models except for the fact that the number of decoder layers are lesser. Despite this, currently it was not possible to use TensorRT-LLM with distil-whisper since the naming convention for the parameters and weights were different among the two frameworks.

This PR addresses this issue and identifies the change of pattern between the two naming conventions. The script uses regex to convert models weights from huggingface to openai-whisper format.

You can do so by running the script convert_from_distil_whisper.py as follows:

# install requirements first
pip install -r requirements.txt

# will download the model weights from huggingface and convert them to openai-whisper's pytorch format
# model is saved to ./assets/ by default
python3 convert_from_distil_whisper.py --model_name distil-whisper/distil-large-v2

# now we can build the model like before:
python3 build.py --model_name distil-large-v2 --output_dir compiled-distil-large-v2 --use_gpt_attention_plugin --use_gemm_plugin --use_bert_attention_plugin

Results and figures:

This might be useful when one wants fast latency models for short audio clips. Using TensorRT-LLM for distil-whisper provides upto 3x increase improvement in latency over huggingface.

For instance, for mean audio size of $3.5$ seconds, we are able to get a mean latency of $0.086$ seconds on an A10G.

Distribution of wav sizes vs latency:

Bhuvanesh09 · 2024-02-08T05:59:20Z

@symphonylyh Kindly, take a look if possible.

yuekaizhang · 2024-02-19T13:16:19Z

Hi @Bhuvanesh09, great work! I will take this PR into internal gitlab to do some CI test. We will credit your work on the release notes for distill-whisper model.

kaiyux · 2024-02-27T09:40:03Z

Hi @Bhuvanesh09 , thanks very much for your great work. The update including your changes have been merged into the main branch, (see #1168) and we've credit you as the co-author.

We are going to close this PR. Thanks again for your support, and please let us know if you have any questions.

Bhuvanesh Sridharan added 5 commits February 6, 2024 19:18

Added script to convert hf params to openai-whisper style

c98dc82

refactor: reorganised and added print for info

f8f9a32

fix: typos

813cc15

Added convertion script to README.md

002de55

Creating output director if necessary

b4fb2c3

kaiyux mentioned this pull request Feb 27, 2024

Update TensorRT-LLM #1168

Merged

kaiyux mentioned this pull request Apr 12, 2024

Update TensorRT-LLM Release branch #1445

Merged

kaiyux closed this Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding distil-whisper model support to TensorRT-LLM #1061

Adding distil-whisper model support to TensorRT-LLM #1061

Bhuvanesh09 commented Feb 7, 2024

Bhuvanesh09 commented Feb 8, 2024

yuekaizhang commented Feb 19, 2024

kaiyux commented Feb 27, 2024

Adding distil-whisper model support to TensorRT-LLM #1061

Adding distil-whisper model support to TensorRT-LLM #1061

Conversation

Bhuvanesh09 commented Feb 7, 2024

What this PR Does:

How the PR Does it:

Results and figures:

Bhuvanesh09 commented Feb 8, 2024

yuekaizhang commented Feb 19, 2024

kaiyux commented Feb 27, 2024