Replies: 3 comments
-
>>> lissyx |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
>>> jabyrd3 |
Beta Was this translation helpful? Give feedback.
-
>>> lissyx |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
>>> jabyrd3 |
Beta Was this translation helpful? Give feedback.
-
>>> jabyrd3
[December 10, 2017, 3:01am]
Hi!
My hardware doesn't support avx2 (this machine is on ivy bridge), so i
spent the day building native-client from source, including tensorflow.
The process was a bit arduous, but I ended up getting everything
working. When i run the binary with the pretrained model, I can now get
some output without anything throwing an error.
The problem is that the output is laughably bad. Its not even like...
close. I think i'm doing something wrong.
For example: one input i've been using is a .wav (single channel, 16bit)
of me saying 'testing testing, 123'. The output from the pretrained
model is just 'oo'
Another file, much clearer, in the same format: slash 'hi, im amy, one of the
available high quality text to speech voices, select download now to
install my voice'.
A smattering of outputs i got from deepspeech on that file when messing
around with .wav encoding parameters:
'har omm one omhumho wommen'
'har o awi won o veembo hoa homten ta'
'a am won a vemhomable han wontontun'
I think i'm encoding the files incorrectly, or something. I'm not sure
what else to do to get some clean output.
[This is an archived TTS discussion thread from discourse.mozilla.org/t/built-native-client-from-scratch-accuracy-issues]
Beta Was this translation helpful? Give feedback.
All reactions