-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Distil-Whisper] Add support for Distil-Whisper #1423
Comments
Linking for visibility: #1414 |
Hi @patrickvonplaten - congrats on the release! I believe I have successfully added initial support for the distilled models in the following PR: #1424 However, I'm worried that for optimal quality, AFAICT these models require an alternative decoding strategy with overlapping chunks for long-form transcriptions. This can take more time to implement and I am not sure yet how to fit it in the existing implementation. Could you point me to the reference implementation? I will give it a thought and see if I can come up with a solution in the following days. |
Hey @ggerganov, The implementation we're using in Transformers actually uses overlapping chunks. We overlap each chunk by 2.5 seconds. Essentially we follow the strategy as described here: https://huggingface.co/blog/asr-chunking using a chunk length of 15 seconds and chunk_stride of 2.5 second (default). It's all implemented here: https://github.com/huggingface/transformers/blob/ac5d4cf6de24b4f7fa92996e92d1d71dd5411a6a/src/transformers/pipelines/automatic_speech_recognition.py#L135 and the code to run in inference for debugging should be this one: https://github.com/huggingface/distil-whisper/tree/main#long-form-transcription The other option is to just use openai's codebase: https://github.com/openai/whisper using distil-whisper checkpoints converted into the original format: https://huggingface.co/distil-whisper/distil-large-v2/blob/main/original-model.fp32.bin Does this help? I'm also working on adding OAI's naively to Transformers for easier debugging but this might take until next week |
Thanks for the links. Will probably look into chunking after I make the |
i would like to weigh in from the "end user peanut gallery" that i believe the full implementation of the chunking for of course everyone would rather transcribe locally for privacy and cost reasons. you have the power to make this practical. everyone will have their own private transcriptionist. we don't need another 10x to make this a UX inflection, just another 5x will seriously change the game. thank you for the important work that you do! |
I haven't managed to run the conversion scripts myself (see #1711). Is there any chance you could release additional versions, using the GGUF format with the recent quantization options? |
any chances for this to support with https://huggingface.co/Aspik101/distil-whisper-large-v3-pl ? |
I'd love to see this as well. The distil models run so much faster but unfortunately for anything longer than 10-20 seconds, it starts cutting out words/phrases. I tested against a distil model using regular Whisper here https://huggingface.co/spaces/distil-whisper/whisper-vs-distil-whisper with the same audio file and it works nearly flawlessly. But for some reason using it through whisper.cpp creates a large number of errors and words that are cut off or misspelled (I'm assuming it's because it's chunking oddly). Would love to see this fixed. |
@patrickvonplaten with the latest release of Distilled V3 my understanding is that Distilled model is no longer exclusively tied to the chunked algorithm as far as I can understand So maybe this ticket could be closed? I suppose it mainly remained open to address the chunking? |
Hey,
We've recently released two Distil-Whisper checkpoints:
On GPU, we achieve speed-ups of up to 6x compared to the teacher models at relatively minimal degradation in performance.
More information here: https://twitter.com/sanchitgandhi99/status/1719409022246220184
Using your conversion scripts, we've already converted the checkpoints to .cpp format see:
We'd love to collaborate on supporting the checkpoints for this repository as we're really excited to see about the potential speed-ups that can be achieved on optimized C++ code.
It looks like some changes to
whisper.cpp
will be necessary for such a change (e.g. we should probably define a new model type here?)@ggerganov would you be interested in adding Distil-Whisper?
The text was updated successfully, but these errors were encountered: