For the following model, I added a change where each section transcribed 1 minute of audio, addressing the issue where longer audio files would return an error due to their length. The first section represents the first minute (0:00 - 1:00), the second section represents the following minute (1:00 - 2:00) and so on. Below is information regarding the Whisper ASR Webservice as a whole.
Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitask model that can perform multilingual speech recognition as well as speech translation and language identification. For more details: github.com/openai/whisper
Current release (v1.4.1) supports following whisper models:
docker run -d -p 9000:9000 -e ASR_MODEL=base -e ASR_ENGINE=openai_whisper onerahmet/openai-whisper-asr-webservice:latest
docker run -d --gpus all -p 9000:9000 -e ASR_MODEL=base -e ASR_ENGINE=openai_whisper onerahmet/openai-whisper-asr-webservice:latest-gpu
for more information:
Explore the documentation by clicking here.