Streaming api on mac os x #624

JRMeyer · 2021-03-08T03:10:09Z

JRMeyer
Mar 8, 2021
Maintainer

>>> erdoc
[May 3, 2019, 8:37pm]

Trying to replication ruben's small python program:
https://hacks.mozilla.org/2018/09/speech-recognition-deepspeech/ in
macos mojave terminal.

libsox installed and I have access to microphone from terminal. inside a
python 3.7.0 virtual environment.

program runs without errors but the output is only 'Transcription: ...
BLANK'. It appears model.finishStream(sctx) doesn't output anything.

I have ensured the mic is working by changing the rec parameter -q to -S
and V3, the paths to the model, LM and trie files are all correct
(DeepSpeech works when called from the cmd line and supplied an audio
file argument).

Lastly this is the console output:

> python test.py --model models/output_graph.pbmm --alphabet models/alphabet.text --lm models/lm.binary --trie models/trie

> Initializing model...
> TensorFlow: v1.12.0-10-ge232881c5a
> DeepSpeech: v0.4.1-0-g0e40db6
> 2019-05-03 15:14:11.615995: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
> You can start speaking now. Press Control-C to stop recording.
> rec: SoX v
> rec WARN formats: can't set sample rate 16000; using 44100
> rec WARN formats: can't set 1 channels; using 2
>
> Input File : 'default' (coreaudio)
> Channels : 2
> Sample Rate : 44100
> Precision : 32-bit
> Sample Encoding: 32-bit Signed Integer PCM
> Endian Type : little
> Reverse Nibbles: no
> Reverse Bits : no
>
> Output File : '-' (raw)
> Channels : 1
> Sample Rate : 16000
> Precision : 16-bit
> Sample Encoding: 16-bit Signed Integer PCM
> Endian Type : little
> Reverse Nibbles: no
> Reverse Bits : no
> Comment : 'Processed by SoX'
>
> rec INFO sox: effects chain: input 44100Hz 2 channels
> rec INFO sox: effects chain: gain 44100Hz 2 channels
> rec INFO sox: effects chain: channels 44100Hz 1 channels
> rec INFO sox: effects chain: rate 16000Hz 1 channels
> rec INFO sox: effects chain: dither 16000Hz 1 channels
> rec INFO sox: effects chain: output 16000Hz 1 channels
> In:0.00% 00:00:08.61 [00:00:00.00] Out:137k [ | ] Clip:0 ^C
> Aborted.
> Transcription:

thank you.

[This is an archived TTS discussion thread from discourse.mozilla.org/t/streaming-api-on-mac-os-x]

JRMeyer · 2021-03-08T03:10:12Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> lissyx
[May 4, 2019, 1:52pm]

> rec WARN formats: can't set sample rate 16000; using 44100 rec WARN
> formats: can't set 1 channels; using 2

Please dump the WAV PCM that you feed to streaming and try it with
prebuilt tools. We won't be able to check if there's a bug in your code
if you don't share code.

[Archived Post]

0 replies

JRMeyer · 2021-03-08T03:10:14Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> reuben
[May 4, 2019, 2:15pm]

That doesn't look like a bug in the client, the sox warnings mean it
can't set the recording sample rate and channel count directly on the
hardware/driver, but it does the conversion in software in that case, so
you get the proper output in the end, as the table says.

If you record into a WAV file instead of using the streaming API and
then feed that WAV file to our deepspeech client, does it work?

[Archived Post]

0 replies

JRMeyer · 2021-03-08T03:10:17Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> erdoc
[May 6, 2019, 8:01pm]

yes it works using a regular wav file, it is only when streaming it has
this problem. And I used the prebuilt tools.

[Archived Post]

0 replies

JRMeyer · 2021-03-08T03:10:20Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> lissyx
[May 7, 2019, 11:03am]

> it is only when streaming it has this problem

We would really need to see your code to help you. Could you also share
the recording? slash
Have you dumped the bits you send to streaming API as a wav file and
checked this is working as well? You wording above is unclear about
that.

[Archived Post]

0 replies

JRMeyer · 2021-03-08T03:10:22Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> erdoc
[May 7, 2019, 3:33pm]

I don't have any custom code, I am using the prebuilt deep speech tools
for everything. The code I am using is exactly the one Ruben discussed
last year. here it is:

import argparse
import deepspeech as ds
import numpy as np
import shlex
import subprocess
import sys

parser = argparse.ArgumentParser(description='DeepSpeech speech-to-text from microphone')
parser.add_argument('--model', required=True,
help='Path to the model (protocol buffer binary file)')
parser.add_argument('--alphabet', required=True,
help='Path to the configuration file specifying the alphabet used by the network')
parser.add_argument('--lm', nargs='?',
help='Path to the language model binary file')
parser.add_argument('--trie', nargs='?',
help='Path to the language model trie file created with native_client/generate_trie')
args = parser.parse_args()

LM_WEIGHT = 1.50
VALID_WORD_COUNT_WEIGHT = 2.25
N_FEATURES = 26
N_CONTEXT = 9
BEAM_WIDTH = 512

print('Initializing model...')

model = ds.Model(args.model, N_FEATURES, N_CONTEXT, args.alphabet, BEAM_WIDTH)
if args.lm and args.trie:
model.enableDecoderWithLM(args.alphabet,
args.lm,
args.trie,
LM_WEIGHT,
VALID_WORD_COUNT_WEIGHT)
sctx = model.setupStream()

subproc = subprocess.Popen(shlex.split('rec -q -V0 -e signed -L -c 1 -b 16 -r 16k -t raw - gain -2'),
stdout=subprocess.PIPE,
bufsize=0)
print('You can start speaking now. Press Control-C to stop recording.')

try:
while True:
data = subproc.stdout.read(512)
model.feedAudioContent(sctx, np.frombuffer(data, np.int16))
except KeyboardInterrupt:
print('Transcription:', model.finishStream(sctx))
subproc.terminate()
subproc.wait()

I didn't change anything. As for the wav file, I am not that good in
linux so I don't know how to dump the wav file and at the same time PIPE
it to deep speech. I did a simple cmd line rec and sent to a wav file,
using the slash --audio arg worked using that wav file.

Just want to help any other person who ever has this issue find out if
it is the code or my platform that has a problem. As I said I use a Mac
os x sierra.

[Archived Post]

0 replies

JRMeyer · 2021-03-08T03:10:25Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> reuben
[May 7, 2019, 3:39pm]

This is weird. Just to be sure, did you double check that you're passing
the same parameters to both the microphone script and the client that
takes a WAV file? E.g. are you passing the same LM/trie files to both,
same model, etc. Dumb question, but just trying to make sure.

Also, could you try updating the LM_WEIGHT and VALID_WORD_COUNT_WEIGHT
values in that script to 0.75 and 1.85 respectively? They were updated
in the time between the blog post and v0.4.1. Does that change the
output?

[Archived Post]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming api on mac os x #624

{{title}}

Replies: 6 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Streaming api on mac os x #624

JRMeyer Mar 8, 2021 Maintainer

Replies: 6 comments

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author