Whisper Forced Alignment

An alignment decoder for .

Forced alignment operates by analyzing the given audio file alongside a provided text string. Through this process, the model evaluates the likelihood of the speech within the audio accurately representing the specified text.

Setup

Python and Pytorch requirements are the same as Whisper. The setup and download can be done as:

pip install git+https://github.com/Warpawn/Whisper-Forced-Alignment

Python usage

To perform forced alignment, utilize the decode function within the provided framework. Ensure that $MODEL_TYPE_WHISP is specified correctly, such as selecting one of the Whisper models, for example, large. The $audio_file variable should contain the file path pointing to the audio resource. To activate the forced alignment functionality, set the alignment_text parameter accordingly.

import whisper

model = whisper.load_model($MODEL_TYPE_WHISP)

audio = whisper.load_audio($audio_file)
audio = whisper.pad_or_trim(audio)
mel = whisper.log_mel_spectrogram(audio).to(model.device)
options = whisper.DecodingOptions(language="en",alignment_text=$current_line)
result = whisper.decode(model, mel, options)

print(result.avg_logprob)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
whisper		whisper
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper Forced Alignment

Setup

Python usage

About

Releases

Packages

Languages

License

ArenAcikgoz/Whisper-Alignment

Folders and files

Latest commit

History

Repository files navigation

Whisper Forced Alignment

Setup

Python usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages