Built native client from scratch accuracy issues #32

JRMeyer · 2021-03-08T00:22:28Z

JRMeyer
Mar 8, 2021
Maintainer

>>> jabyrd3
[December 10, 2017, 3:01am]

Hi!

My hardware doesn't support avx2 (this machine is on ivy bridge), so i
spent the day building native-client from source, including tensorflow.

The process was a bit arduous, but I ended up getting everything
working. When i run the binary with the pretrained model, I can now get
some output without anything throwing an error.

The problem is that the output is laughably bad. Its not even like...
close. I think i'm doing something wrong.

For example: one input i've been using is a .wav (single channel, 16bit)
of me saying 'testing testing, 123'. The output from the pretrained
model is just 'oo'

Another file, much clearer, in the same format: slash 'hi, im amy, one of the
available high quality text to speech voices, select download now to
install my voice'.

A smattering of outputs i got from deepspeech on that file when messing
around with .wav encoding parameters:

'har omm one omhumho wommen'

'har o awi won o veembo hoa homten ta'

'a am won a vemhomable han wontontun'

I think i'm encoding the files incorrectly, or something. I'm not sure
what else to do to get some clean output.

[This is an archived TTS discussion thread from discourse.mozilla.org/t/built-native-client-from-scratch-accuracy-issues]

JRMeyer · 2021-03-08T00:22:30Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> lissyx
[December 10, 2017, 10:45am]

Thanks for testing :). First, we switched requirements from AVX2/FMA to
AVX (the binaries are not yet published on pypi and npm registries, but
you can download that from taskcluster:
https://tools.taskcluster.net/index/artifacts/project.deepspeech.deepspeech.native_client.master/cpu

Can you make sure your wav file is 16 bits and 16kHz ? Using mediainfo
you could be able to check that.

[Archived Post]

0 replies

JRMeyer · 2021-03-08T00:22:33Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> jabyrd3
[December 10, 2017, 5:30pm]

> Can you make sure your wav file is 16 bits and 16kHz ? Using mediainfo
> you could be able to check that.

Ah, yeah, I was using 44khz audio. I resampled my test file (same
speech, very clear still) at 16khz 16 bits, and the output was

'howar oire mi wonee vo virmabl ho orntenm'

are there any other encoding gotchas i should look out for?

[Archived Post]

0 replies

JRMeyer · 2021-03-08T00:22:35Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> jabyrd3
[December 10, 2017, 5:49pm]

Ah, and i just ran the precompiled binary. The output was much more
accurate!

Thanks for helping me out. I'm not sure exactly what happened, but i'm
glad i was able to get some decent output.

[Archived Post]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Built native client from scratch accuracy issues #32

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Built native client from scratch accuracy issues #32

JRMeyer Mar 8, 2021 Maintainer

Replies: 3 comments

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author