Large model requires 4-10x time to process long file. Suggestions to improve time? #1747

philippefutureboy · 2023-11-01T22:15:07Z

philippefutureboy
Nov 1, 2023

Hey there!

Thanks a lot for this transcription/translation model; it is proving to be of very high value for our business by helping us transcribe our important meetings.

What is the issue?

The large model (whisper.load_model('large')) performs significantly worse than the documented times to transcription ratios, with a transcription time ranging between 4-10x longer than the video duration (video duration between 90min and 180min).
We are running the models on local machines:

Macbook Pro 2019, Intel Core i9, 16GB DDR4, Graphics Intel UHD Graphic 630 1536MB;
Macbook Pro 2019, Intel Core i9, 32GB DDR4, AMD Radeon Pro 5500M 8 GB+Intel UHD Graphics 630 1536MB

Both of which are not in usage outside of said transcriptions.

What is the expected behaviour?

The large model to perform at the same time to transcription ratio as documented.

Discussion

I hypothesize that the difference in documented speeds is related to the hardware used - after all, NN are typically run on GPUs, not CPUs. Could this be the case?
Would this perform differently if deployed on a machine with a dedicated GPU?
If so, could you update the docs to reflect the need for a GPU, and if possible, a link to what kind of instances on major cloud providers are advisable for such workloads?

Alternatively, if this is not the case, is there some kind of parameter or execution context that must be established for the program to access the full capability of the machine (like some flag to use the GPU or some permission to be given to the process to use the GPU)?

Additionally, would chunking the video into multiple smaller video files (audio files) improve speed without degrading the quality of the output?

Thanks a lot for your help!

Answered by ryanheise

Nov 2, 2023

The large model (whisper.load_model('large')) performs significantly worse than the documented times to transcription ratios

The documentation doesn't give times to transcription ratios, it gives only relative speed. Thus, however fast the large model happens to run on your particular hardware, the medium model will be ~2x faster than that, the small model ~6x faster, and so on. Of course the large model will run faster on faster hardware, but the medium model will run ~2x faster than that on the same hardware.

How fast does Whisper run on particular hardware? There are too many hardware choices that it would be too expensive to buy them all and test them, but individual users who have …

View full answer

ryanheise · 2023-11-02T01:34:29Z

ryanheise
Nov 2, 2023

The large model (whisper.load_model('large')) performs significantly worse than the documented times to transcription ratios

The documentation doesn't give times to transcription ratios, it gives only relative speed. Thus, however fast the large model happens to run on your particular hardware, the medium model will be ~2x faster than that, the small model ~6x faster, and so on. Of course the large model will run faster on faster hardware, but the medium model will run ~2x faster than that on the same hardware.

How fast does Whisper run on particular hardware? There are too many hardware choices that it would be too expensive to buy them all and test them, but individual users who have particular hardware have posted discussions here reporting on how fast Whisper runs on their hardware. You can search this discussion board or the broader internet to find Whisper benchmarks on different hardware. If you search the discussion board, you can also find discussions about speeding up Whisper on the same hardware.

5 replies

philippefutureboy Nov 2, 2023
Author

Oh that's what the relative speed mean! I thought it was relative to the duration of the sample to convert! My bad 😅
Can I suggest that some modification be done to the README to make this fact more clear?

ryanheise Nov 2, 2023

As an open source project, modification suggestions can be made by anyone - see how to submit a pull request. To me, it goes without saying that any software will run faster on faster hardware, and the meaning of relative speeds was clear, so I personally wouldn't change anything. But on the other hand that's just me, and you could submit a pull request if you feel more strongly.

philippefutureboy Nov 2, 2023
Author

PR done: #1751
😅... I know that any software will run faster on faster hardware, that much is obvious (and a bit condescending).
The question about hardware in this case was more along the lines of "would a dedicated GPU make this faster", "would additional memory beyond the requirements listed on the README speed up the transcription or would it not affect it?", and "is there some environment setting or privilege level that can improve/dedicate access to the hardware".
I don't know enough about how trained NN models perform, and how this specific model is optimized to know the answer to these questions. I don't think they are trivial to the ML layman 🤷‍♂️

ryanheise Nov 3, 2023

"would a dedicated GPU make this faster",
"would additional memory beyond the requirements listed on the README speed up the transcription or would it not affect it?",
"is there some environment setting or privilege level that can improve/dedicate access to the hardware"

As mentioned, there are other discussions here about the speeds you can expect with different hardware that are self reported by users who own that hardware, as well as people who have tried different cloud services, and also about tips for speeding up Whisper on the same hardware.

and a bit condescending

Just my personal reasons for not submitting my own PR. You are free to submit a PR, though. Your PR adds a statement that Whisper will run faster on faster hardware, a clarification which I "personally" wouldn't have added, but as I said, that's just me, and which is why I encouraged you to submit a PR if you felt more strongly about it.

philippefutureboy Nov 3, 2023
Author

Sounds good, thanks for clarifying your answer and feelings!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large model requires 4-10x time to process long file. Suggestions to improve time? #1747

{{title}}

Replies: 1 comment 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Large model requires 4-10x time to process long file. Suggestions to improve time? #1747

philippefutureboy Nov 1, 2023

What is the issue?

What is the expected behaviour?

Discussion

Replies: 1 comment · 5 replies

ryanheise Nov 2, 2023

philippefutureboy Nov 2, 2023 Author

ryanheise Nov 2, 2023

philippefutureboy Nov 2, 2023 Author

ryanheise Nov 3, 2023

philippefutureboy Nov 3, 2023 Author

philippefutureboy
Nov 1, 2023

Replies: 1 comment 5 replies

ryanheise
Nov 2, 2023

philippefutureboy Nov 2, 2023
Author

philippefutureboy Nov 2, 2023
Author

philippefutureboy Nov 3, 2023
Author