Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify alphabet in pb* and tflite models #4

Open
JRMeyer opened this issue Jun 24, 2021 · 2 comments
Open

Verify alphabet in pb* and tflite models #4

JRMeyer opened this issue Jun 24, 2021 · 2 comments
Assignees

Comments

@JRMeyer
Copy link
Member

JRMeyer commented Jun 24, 2021

The alphabet files from Jaco models are inconsistent with the output of the models at runtime. It has been observed that the Jaco Spanish model can produce accented vowels, but the alphabet file does not include them. The alphabet file should be confirmed and uploaded to the zoo for language model generation.

TFModelState::init and TFLiteModelState::init can be modified to print out the loaded alphabet used to train the model here: https://github.com/coqui-ai/STT/blob/653ce25a7ce5bd6cbb564416d847d8afcd5c5e8c/native_client/tfmodelstate.cc#L120

@JRMeyer JRMeyer self-assigned this Jun 24, 2021
@mariano-balto
Copy link

Maybe the above could be the cause of the problem we are seeing on a dockerized ARM environment when using the Jaco models for Spanish with the python (3.9) bindings.

coqui-ai/STT#2284

@zuazo
Copy link

zuazo commented Jan 27, 2023

The correct alphabet files seem to be the following: https://gitlab.com/Jaco-Assistant/Scribosermo/-/tree/deepspeech/data

As you said, the Spanish alphabet includes accented vowels. Also other language's alphabets like French and Polish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants