module request: piper #866

KiaraGrouwstra · 2023-11-06T13:22:57Z

piper is 'a fast, local neural text to speech system' (samples here).
it would be nice to have speechd support this as well.

cross-post: rhasspy/piper#265

/cc @Elleo who has done some work integrating these thru pied.

The text was updated successfully, but these errors were encountered:

csukuangfj · 2023-11-06T14:49:36Z

I suggest that you also have a look at
https://github.com/k2-fsa/sherpa-onnx

It is implemented in C++ and has various APIs for different languages, e.g., Python/C/Go/C#/Swift/Kotlin, etc.

You can find Android APKs for it at
https://k2-fsa.github.io/sherpa/onnx/tts/apk.html

You can also try it in our huggingface space without installing anything.
https://huggingface.co/spaces/k2-fsa/text-to-speech

By the way, it supports models from piper as well.

csukuangfj · 2023-11-06T14:51:32Z

Also cc @Elleo . You may find sherpa-onnx interesting.
It supports both speech-to-text and text-to-speech.

Elleo · 2023-11-06T15:31:21Z

Just for a little context on what I'm doing, Pied can currently configure speech dispatcher to work with Piper through the sd_generic module, but my long term plan is to create a piper speech dispatcher module that can be kept loaded to further reduce latency and add support for speed/pitch/etc. changes

@csukuangfj Thanks, that's interesting, I'll check it out!

coderalpha · 2023-11-08T14:08:50Z

I'm trying to integrate Piper through the sd_generic module. But I get the error:
speechd: Error: Module reported error in request from speechd (code 3xx): 300-Opening sound device failed. Reason: Cannot open plugin server. error: file not found.

I can't find any information on this error. Any help will be appreciated!

I added the module in speechd.conf:
AddModule "piper" "sd_generic" "piper.conf"'
DefaultVoiceType "FEMALE1"
DefaultModule "piper"
DefaultLanguage "en"
AudioOutputMethod "libao"

And created the piper.conf file in the /etc/speech-dispatcher/modules directory:
AddVoice "en" "FEMALE1" "en_US-amy-medium.onnx"
DefaultVoice "en_US-amy-medium.onnx"
GenericExecuteSynth "echo \'$DATA\' | /home/dev/Apps/piper/piper --model /home/dev/Apps/piper/models/en_US-amy-medium.onnx --output_raw | paplay"

sthibaul · 2023-11-08T14:17:35Z

As mentioned in the issue template, add Debug 1 to the speechd config file and the module config file, and get the corresponding log files, so we get to know what exactly went wrong.

sthibaul · 2023-11-08T14:20:37Z

(of course, the issue template knows better, that's why we write documentation, so we don't have to rely on our memory: it's LogLevel 5 in the speechd config, and indeed Debug 1 in the module config)

coderalpha · 2023-11-08T15:06:19Z

I set the LogLevel to 5 and Debug to 1 and attached the files
speechd.zip

coderalpha · 2023-11-09T07:28:13Z

I've obviously been trying everything to get this going and in the process I've changed a lot of the config and it might not be optimal. Since it seems to be an issue with the loading of the sound plugin, I changed the AudioOutputMethod back to "pulse". I'm running Ubuntu 22.04 and according to the output of inxi, Pulse is running:

System:
Host: dev Kernel: 6.2.0-36-generic x86_64 bits: 64 Desktop: N/A
Distro: Ubuntu 22.04.3 LTS (Jammy Jellyfish)
Machine:
Type: Desktop Mobo: ASUSTeK model: WS X299 SAGE/10G v: Rev 1.xx
serial: <superuser required> UEFI: American Megatrends v: 3601
date: 09/24/2021
Audio:
Device-1: Intel 200 Series PCH HD Audio driver: snd_hda_intel
Device-2: AMD Navi 21 HDMI Audio [Radeon RX 6800/6800 XT / 6900 XT]
driver: snd_hda_intel
Device-3: AMD Navi 21 HDMI Audio [Radeon RX 6800/6800 XT / 6900 XT]
driver: snd_hda_intel
Sound Server-1: ALSA v: k6.2.0-36-generic running: yes
Sound Server-2: PulseAudio v: 15.99.1 running: yes
Sound Server-3: PipeWire v: 0.3.48 running: yes

Then I get the following error:
speechd: Error: Module reported error in request from speechd (code 3xx): 300-Opening sound device failed. Reason: Couldn't open pulse plugin.

I noticed that there is an Ubuntu package speech-dispatcher-audio-plugins, and it is installed, and contains the following:
/usr/lib/x86_64-linux-gnu/speech-dispatcher
/usr/lib/x86_64-linux-gnu/speech-dispatcher/spd_alsa.so
/usr/lib/x86_64-linux-gnu/speech-dispatcher/spd_libao.so
/usr/lib/x86_64-linux-gnu/speech-dispatcher/spd_oss.so
/usr/lib/x86_64-linux-gnu/speech-dispatcher/spd_pulse.so

So, the Pulse plugin is installed.

sthibaul · 2023-11-11T22:46:27Z

Since it is using the generic module, piper.conf is just passing audio to paplay (though it should rather be $PLAY_COMMAND so it works automatically with pulse, ao, alsa, etc.), so there is no need for an audio plugin.

Reason: Cannot open plugin server. error: file not found : that happens with other generic modules actually: the server is just trying to make audio go through it, and notices the error and falls back to making the module open audio itself. The warning is indeed confusing, I have now fixed it.

Your speechd.log seems to be showing various attempts, I can't see how to know what corresponds to what configuration you used.

Actually, at the end of your speechd.log there doesn't seem to be any issue?

coderalpha · 2023-11-13T08:36:43Z

Yes, there aren't any issues logged in speech-dispatcher.log or piper.log, but it isn't working. There is no sound played. If I run the command directly, i.e.
echo "hello" | /home/dev/Apps/piper/piper --model /home/dev/Apps/piper/models/en_US-amy-medium.onnx --output_raw

it works.

sthibaul · 2023-11-13T08:44:08Z

Your command is missing the paplay part?

Also, in your speech-dispatcher.log I don't see any speech attempt, how do you actually test it?

coderalpha · 2023-11-13T08:48:56Z

Yes, the full command I run on the command-line is:
"echo 'hello' | ./piper --output-raw --model models/en_US-amy-medium.onnx | aplay -r 22050 -f S16_LE -t raw"

Through speech-dispatcher, I test it on the command-line with spd-say "hello"

sthibaul · 2023-11-13T08:53:59Z

Yes, the full command I run on the command-line is:

You are using aplay here, not paplay, you need to test exactly the same way as you described in the .conf file...

Through speech-dispatcher, I test it on the command-line with spd-say "hello"

Then please provide the logs that correspond to this test. The logs you uploaded didn't contain anything about that.

murlakatamenka · 2023-11-13T09:58:27Z

"echo 'hello' | ./piper --output-raw --model models/en_US-amy-medium.onnx | aplay -r 22050 -f S16_LE -t raw"

is there a specific reason for ./piper? I would suggest using just piper or absolute path like /usr/bin/piper.

coderalpha · 2023-11-13T10:25:18Z

I've changed the configuration to the simplest case to avoid confusion. I selected alsa for the audio output.

I can use the following command and it works:
echo "hello" | /home/dev/Apps/piper/piper --model /home/dev/Apps/piper/models/en_US-amy-medium.onnx --output_raw | aplay -r 22050 -f S16_LE -t raw -
According to the log file everything looks good to me, yet no sound.
speechd.zip
sound.

sthibaul · 2023-11-13T10:34:00Z

There is still a difference: /home/dev vs /home/ws2.

And your speech-dispatcher.log still doesn't show any attempt to speech anything. No client ever connects to it within the 5s daemon timeout:

[Mon Nov 13 12:05:59 2023 : 716872] speechd:    Currently no clients connected, enabling shutdown timer.
[Mon Nov 13 12:05:59 2023 : 716898] speechd:    speak_queue Playback thread starting.......
[Mon Nov 13 12:06:04 2023 : 875778] speechd: Terminating...

Again: how exactly do you test?

coderalpha · 2023-11-13T11:03:54Z

Again: spd-say "hello"

sthibaul · 2023-11-13T11:26:21Z

But that does not show up at all in the logs... Are you sure you have only one installation of speech-dispatcher, as in: is spd-say actually connecting to the speech-dispatcher daemon that you are starting? Does it work with other speech syntheses?

coderalpha · 2023-11-13T12:40:24Z

I just tried to get going from scratch on a different computer and now I have the issue where speech-dispatcher doesn't want to start.

sudo systemctl restart speech-dispatcher Job for speech-dispatcher.service failed because the control process exited with error code. See "systemctl status speech-dispatcher.service" and "journalctl -xeu speech-dispatcher.service" for details.

In the log file, it is the same issue:
Reply from output module: |300-Opening sound device failed. Reason: Cannot open plugin server. error: file not found. 300 MODULE ERROR | [Mon Nov 13 14:23:37 2023 : 432859] speechd: Error: Module reported error in request from speechd (code 3xx): 300-Opening sound device failed. Reason: Cannot open plugin server. error: file not found.
And using the command produces output from piper:
echo "hello" | ~/Apps/piper/piper --output-raw --model ~/Apps/piper/models/en_US-amy-medium.onnx | aplay -r 22050 -f S16_LE -t raw
spd-say works but it isn't using piper.

inxi -SMA System: Host: GCS-WS5 Kernel: 6.2.0-36-generic x86_64 bits: 64 Desktop: N/A Distro: Ubuntu 22.04.3 LTS (Jammy Jellyfish) Machine: Type: Laptop System: Dell product: Precision 5570 v: N/A serial: <superuser required> Mobo: Dell model: 03M8N5 v: A00 serial: <superuser required> UEFI: Dell v: 1.18.0 date: 09/12/2023 Audio: Device-1: Intel Alder Lake PCH-P High Definition Audio driver: snd_hda_intel Sound Server-1: ALSA v: k6.2.0-36-generic running: yes Sound Server-2: PulseAudio v: 15.99.1 running: yes Sound Server-3: PipeWire v: 0.3.48 running: yes

It seems that spd-say is not using speech-dispatcher as the voice is different from the piper voice. This is a standard Ubuntu install

sthibaul · 2023-11-13T13:28:34Z

Reason: Cannot open plugin server. error: file not found.

As I already mentioned, this is just a harmless warning. What's important is after that. That's why one should always put the whole log in the bug report.

using the command produces output from piper:
echo "hello" | ~/Apps/piper/piper --output-raw --model ~/Apps/piper/models/en_US-amy-medium.onnx | aplay -r 22050 -f S16_LE -t raw

Does that work as root? You are starting speech-dispatcher from systemd, but that assumes that you can emit audio from root-started speech-dispatcher. Nowadays what usually happens is rather that you don't start speech-dispatcher from systemd, but let it get auto-started from the spd-say call.

spd-say works but it isn't using piper.

You can use spd-say -O to get the list of modules, and spd-say -o yourmodule foo to select which module you want speech to go through.

coderalpha · 2023-11-13T13:55:43Z

I can run the command with sudo and get audio output:
sudo echo "hello" | ~/Apps/piper/piper --output-raw --model ~/Apps/piper/models/en_US-amy-medium.onnx | aplay -r 22050 -f S16_LE -t raw

According to spd-say:
send text-to-speech output request to speech-dispatcher

But it is not using the speech-dispatcher that I've configured! If I run spd-say -O -L, I get:
OUTPUT MODULES espeak-ng
with LOTS of voices. In my speechd.conf I commented out the module "espeak-ng". It seems speech-dsipatcher is getting a different config.

It doesn't seem like there is any logic to how this operates...

It seems I have to abandon this, but I'm working on a Qt application, and QtTextToSpeech integrates with speech-dispatcher.

sthibaul · 2023-11-13T14:08:41Z

I can run the command with sudo and get audio output:

sudo only applies to the first command of your pipeline. It's just before aplay that you want to put sudo so as to properly test audio as root.

But it is not using the speech-dispatcher that I've configured!

Maybe check whether you might have different log files in /var/log, in /run/user/*/log

It doesn't seem like there is any logic to how this operates...

There is, it's just that with nowaday's desktops, things have become more involved, as system-wide daemons are now frowned upon, and thus daemons are rather started in user sessions.

coderalpha · 2023-11-13T14:16:07Z

Using sudo before aplay results in:
ALSA lib pcm_dmix.c:1032:(snd_pcm_dmix_open) unable to open slave aplay: main:831: audio open error: Device or resource busy [2023-11-13 16:14:47.232] [piper] [info] Loaded voice in 0.192067876 second(s) [2023-11-13 16:14:47.232] [piper] [info] Initialized piper

sthibaul · 2023-11-13T14:20:41Z

Using sudo before aplay results in:

So that explains why using a system-wide speech-dispatcher won't work. And thus why you want to just let the speechd auto-start trigger in your desktop session (as is the default), and see logs in /run/user/*/log

coderalpha · 2023-11-13T14:24:36Z

Can you point me to the documentation to do this? All the explanations I've seen show the configuration I've applied.

How do I undo the changes I've made? Do I just remove the references in speecd.conf to piper?

sthibaul · 2023-11-13T14:35:20Z

Can you point me to the documentation to do this?

It's already the default. Your spd-say call is probably already doing that, and you are just not opening the log files corresponding to that. Again, normally they end up in something like /run/user/*/log.

All the explanations I've seen show the configuration I've applied.

Yes, that's the problem with documentation when people don't take the time to update them. Help is welcome.

How do I undo the changes I've made? Do I just remove the references in speecd.conf to piper?

You probably don't need to do anything, and just make sure to open the logs that actually correspond to the instance that is auto-started.

coderalpha · 2023-11-13T14:43:27Z

There is no log directory in the /run/user/1000 directory.

So where do I configure the piper module if the way I did it is incorrect?

Elleo · 2023-11-22T12:24:13Z

@andresmessina1701 The first version of Pied is now publicly released, that can automatically set everything up for you: https://pied.mikeasoft.com/

murlakatamenka · 2023-11-22T12:40:55Z

@Elleo it is only available as a snap, right?

it has various options if you compile it yourself (flatpak, appimage), see the repo:

https://github.com/Elleo/pied

Elleo · 2023-11-22T12:41:01Z

@Elleo it is only available as a snap, right?

Currently, yes; I am working on making it available via flatpak and appimage (and probably eventually as a deb too), but there are still some issues that need work with those packages.

Elleo · 2023-11-22T12:50:08Z

@Elleo great to hear, and thank you for making the process easier with a GUI application, helps a lot!

You're welcome!

carlocastoldi · 2023-12-22T17:22:21Z

For anyone wondering, this the module for piper that i wrote. It can handle multiple languages and maps [-100, 100] speed (=RATE) values to [0.1,3] for sox to handle.
However, it does not handle the volume I can't lower it, only have it muted, normal or boosted (which is useless for me)

# /etc/speech-dispatcher/modules/piper-generic.conf
Debug "1"

GenericCmdDependency "piper-tts"
GenericCmdDependency "sox"
GenericCmdDependency "jq"
GenericCmdDependency "bc"
GenericExecuteSynth \
"printf %s \'\$DATA\' \
| /opt/piper-tts/piper --model /opt/piper-tts/voices/\$VOICE.onnx --output_raw \
| sox -v 1 -r \$(jq .audio.sample_rate < /opt/piper-tts/voices/\$VOICE.onnx.json) -c 1 -b 16 -e signed-integer -t raw - -t wav - tempo \$(echo \"0.000055*\$RATE*\$RATE+0.0145*\$RATE+1\" | bc) pitch \$PITCH norm \
| \$PLAY_COMMAND"
# not using $VOLUME

AddVoice "en-us" "MALE1"    "en_US-ryan-medium"         # "en_US-ryan-high"
AddVoice "en-us" "MALE2"    "en_US-lessac-medium"       # "en_US-lessac-high"
AddVoice "en-gb" "FEMALE1"  "en_GB-jenny_dioco-medium"
AddVoice "en-us" "FEMALE2"  "en_US-amy-medium"
AddVoice "it"    "MALE1"    "it_IT-riccardo-x_low"

DefaultVoice "it_IT-riccardo-x_low"

I found that using high quality models takes some time. I have a better experience with medium!

⚠️NOTE⚠️
I bumped my head hard for hours on why speechd couldn't open any sound device with any generic model. Similarly to @coderalpha I kept running speech-dispatcher as a service through systemd. I have no idea why I had in mind that that was the "correct way" of running it.
So yea... just like it was mentioned above, I would recommend forgetting about the systemd's service at all.

sthibaul · 2023-12-23T18:42:33Z

This looks nice :)

@carlocastoldi could you try to add

VoiceFileDependency /opt/piper-tts/voices/$VOICE.onnx

to check that this correctly makes the voice list shown by spd-say -o piper-generic -L matches what is available in /opt/piper-tts/voices?

tkapias · 2023-12-25T20:19:36Z

My user module config for speechd works fine, I am sharing it below.
But I don't understand how to adapt the Rate/Pitch formula, maybe someone will have an idea.

I am on Debian Testing.
I installed the binary/amd64 version in ~/.local/opt/piper/ and made a symbolic links in ~/.local/bin/.
I download the voice files in ~/.local/share/piper/voices/
I always manually kill speechd processes after config modifications.
- I used spd-conf to create a default config for my user.
- I updated ~/.config/speech-dispatcher/speechd.conf:

Timeout 30                                                                   
LogLevel  2                                                                  
LogDir  "default"                                                            
                                                                             
DefaultVolume 100                                                            
DefaultVoiceType "MALE1"                                                     
DefaultLanguage "en"                                                         
DefaultPunctuationMode "some"                                                
                                                                             
SymbolsPreproc "char"
SymbolsPreprocFile "gender-neutral.dic"
SymbolsPreprocFile "font-variants.dic" 
SymbolsPreprocFile "symbols.dic"   
SymbolsPreprocFile "emojis.dic"    
SymbolsPreprocFile "orca.dic"
SymbolsPreprocFile "orca-chars.dic"

DefaultCapLetRecognition  "none"
DefaultSpelling  Off
                                                                             
AudioOutputMethod "pulse"            
AudioPulseDevice "default"            
AudioPulseMinLength 10 

AddModule "piper"                   "sd_generic"   "piper.conf"

DefaultModule piper                                                          
                                                                             
LanguageDefaultModule "en"  "piper"
LanguageDefaultModule "fr"  "piper"
                                      
Include "clients/*.conf"

I created the new module using an existing module on the "sd_generic" model (listed in speechd.conf).
- I updated the new module ~/.config/speech-dispatcher/module/piper.conf:

Debug 0

GenericExecuteSynth "printf %s \'$DATA\' | piper --length_scale 1 --sentence_silence 0 --model ~/.local/share/piper/voices/$VOICE --output-raw | aplay -r 22050 -f S16_LE -t raw -"
# only use medium quality voices to respect the 22050 rate for aplay in the command above.

GenericCmdDependency "piper"
GenericCmdDependency "aplay"
GenericCmdDependency "printf"
GenericSoundIconFolder "/usr/share/sounds/sound-icons/"

GenericPunctNone ""
GenericPunctSome "--punct=\"()<>[]{}\""
GenericPunctMost "--punct=\"()[]{};:\""
GenericPunctAll "--punct"

#GenericStripPunctChars  ""

GenericLanguage  "en" "en_US" "utf-8"
GenericLanguage  "fr" "fr_FR" "utf-8"

AddVoice        "en"    "MALE1"         "en_US-hfc_male-medium.onnx"
AddVoice        "en"    "FEMALE1"       "en_US-amy-medium.onnx"
AddVoice        "fr"    "MALE1"         "fr_FR-upmc-medium.onnx -s 1"
AddVoice        "fr"    "FEMALE1"       "fr_FR-upmc-medium.onnx"

DefaultVoice    "en_US-amy-medium.onnx"

#GenericRateForceInteger 1
#GenericRateAdd 1
#GenericRateMultiply 100

In the config above --length_scale 1 could be replaced with --length_scale $RATE to manage the voice rate. But, I don't know how to apply the formula explained in speedchd documentation.

tkapias · 2023-12-25T21:10:07Z

Ok, I'm still not sure how the formula works because if you put 0 in GenericRateAdd the output becomes a float and with 1 it becomes an integer and it's not the purpose given in the doc.

But, it works with bc.

Here is my piper module with the rate parameter working:

(I don't use pitch modifications but has 2 noise parameters if someone want to set it)

Debug 0

GenericExecuteSynth "printf %s \'$DATA\' | piper --length_scale \`echo \'($RATE * -0.01) + 1\' \| bc\` --sentence_silence 0 --model ~/.local/share/piper/voices/$VOICE --output-raw | aplay -r 22050 -f S16_LE -t raw -"
# only use medium quality voices to respect the 22050 rate for aplay in the command above.

GenericCmdDependency "piper"
GenericCmdDependency "aplay"
GenericCmdDependency "printf"
GenericCmdDependency "bc"
GenericSoundIconFolder "/usr/share/sounds/sound-icons/"

GenericPunctNone ""
GenericPunctSome "--punct=\"()<>[]{}\""
GenericPunctMost "--punct=\"()[]{};:\""
GenericPunctAll "--punct"

#GenericStripPunctChars  ""

GenericLanguage  "en" "en_US" "utf-8"
GenericLanguage  "fr" "fr_FR" "utf-8"

AddVoice        "en"    "MALE1"         "en_US-hfc_male-medium.onnx"
AddVoice        "en"    "FEMALE1"       "en_US-amy-medium.onnx"
AddVoice        "fr"    "MALE1"         "fr_FR-upmc-medium.onnx -s 1"
AddVoice        "fr"    "FEMALE1"       "fr_FR-upmc-medium.onnx"

DefaultVoice    "en_US-amy-medium.onnx"

# for --length_scale $RATE (default: 1.0)
#GenericRateAdd num
#GenericRateMultiply num
# for --noise_scale $PITCH (default: 0.667)
#GenericPitchAdd num
#GenericPitchMultiply num
# for --noise_w $PITCH_RANGE (default: 0.8)
#GenericPitchRangeAdd num
#GenericPitchRangeMultiply num

omega3 · 2024-01-28T18:25:07Z

Could you please give some description to non-technical users like me what to change in config to replace:

AddVoice "en" "MALE1"

DefaultVoiceType "MALE1"

What these values are and how can I replace them to chose different voice? How to find this classification for equivalent of "MALE1", for example
libritts_r medium 8699(1)
or jenny_dioco
or en_GB-northern_english_male-medium.onnx?

I typed piper --help and don't see any --list-voices command. I downloaded them from hugging face and so far applied from command line pointing to onnx file.

For example in Plasma Okular there is an option to change voice (this also sometimes means language). But with proposed configuration I don't know how to make other voices available for speech dispatcher.

tkapias · 2024-01-29T03:40:44Z

Put your .onnx and .onnx.json voice files in ~/.local/share/piper/voices/.
Each .onnx file need a .onnx.json file in the same folder.
You need a custom main configuration file in ~/.config/speech-dispatcher/speechd.conf, where you would define some general defaults, like DefaultVoiceType or DefaultModule.
And you need to create a configuration file for Piper in ~/.config/speech-dispatcher/module/piper.conf, where you define how to call Piper and available options, like AddVoice.

Check my 2 previous comments (1, 2), you should be able to use it by modifying only those lines: LanguageDefaultModule, GenericLanguage, AddVoice, DefaultVoice.

RoyalOughtness · 2024-03-02T22:31:45Z

FYI, I found this app which does it all for you 😄

https://github.com/Elleo/pied

omega3 · 2024-03-03T17:36:34Z

which does it all for you

Unfortunately not all. It changes config files every time voice is changed, so there is no way to set speed or other values and keep it.

And with Pied only one voice is available at a time as on option in programs like Calibre or Okular.

Piper still needs a good speech dispatcher support like Festival or espeak have.

Elleo · 2024-03-04T10:27:05Z

Unfortunately not all. It changes config files every time voice is changed, so there is no way to set speed or other values and keep it.

Just as a side-note, if you have sox installed then Pied 0.2 now supports speech-dispatcher's dynamic rate and pitch settings at runtime

sthibaul · 2024-03-05T00:32:09Z

It changes config files every time voice is changed

Which is really not the way speech-dispatcher workers. The piper module should just expose all the voices that are available, just like e.g. espeak-ng-mbrola-generic.conf does.

KAGEYAM4 · 2024-05-13T08:05:57Z

Can someone share working config, my config which i got from - https://aur.archlinux.org/cgit/aur.git/tree/piper-generic.conf?h=piper-voices-common and also the config generated by Pied -- had long pause between sentences ( 2-3 ) seconds.

Found this black magic - GenericDelimiters "˨" from ken107/read-aloud#375 (comment) which fixed it. But now after fixing that i realise paragraph also have 2-3 seconds pause.

Edit - i asked in read-aloud repo, and they said ->

I'm guessing the 2-3 second pause you're experiencing is the time it takes to synthesize the next sentence. Our implementation deals with that by pre-synthesizing the next sentence while the current sentence is being spoken. Your tool will need to support this 'prefetching' strategy.

Any idea on how to do prefetch?

tkapias · 2024-05-13T10:46:14Z

@KAGEYAM4, I tried adding your black magic from ken107/read-aloud#375(comment-1937517761), but it just added more pauses everywhere.

Just try my config from a few comments above, even on a 15 years old machine that I use for tests, I have less than 200-400ms at start and between paragraphs.

KAGEYAM4 · 2024-05-13T13:23:32Z

@KAGEYAM4, I tried adding your black magic from ken107/read-aloud#375(comment-1937517761), but it just added more pauses everywhere.

Just try my config from a few comments above, even on a 15 years old machine that I use for tests, I have less than 200-400ms at start and between paragraphs.

i used your config it's alot better. Thanks a lot.

By the way does the following error matter? It seems arch-repo dosen't provide these files ->

[Mon May 13 18:51:06 2024 : 962640] speechd: Failed to open file '/usr/share/speech-dispatcher/locale/en/gender-neutral.dic': No such file or directory
[Mon May 13 18:51:06 2024 : 962691] speechd: Failed to load symbols 'gender-neutral.dic' for locale 'en'
[Mon May 13 18:51:06 2024 : 962763] speechd: Failed to open file '/usr/share/speech-dispatcher/locale/en/font-variants.dic': No such file or directory
[Mon May 13 18:51:06 2024 : 962777] speechd: Failed to load symbols 'font-variants.dic' for locale 'en'
[Mon May 13 18:51:06 2024 : 962877] speechd: Loading NUL byte entry is not yet supported
[Mon May 13 18:51:06 2024 : 962889] speechd: Invalid line in file /usr/share/speech-dispatcher/locale/en/symbols.dic: \0	blank	char	# null
[Mon May 13 18:51:06 2024 : 964295] speechd: Loading NUL byte entry is not yet supported
[Mon May 13 18:51:06 2024 : 964317] speechd: Invalid line in file /usr/share/speech-dispatcher/locale/base/symbols.dic: \0	blank	char	# null
[Mon May 13 18:51:06 2024 : 982271] speechd: Failed to open file '/usr/share/speech-dispatcher/locale/base/emojis.dic': No such file or directory
[Mon May 13 18:51:07 2024 : 858] speechd: Failed to open file '/usr/share/speech-dispatcher/locale/en/orca.dic': No such file or directory
[Mon May 13 18:51:07 2024 : 889] speechd: Failed to load symbols 'orca.dic' for locale 'en'
[Mon May 13 18:51:07 2024 : 922] speechd: Failed to open file '/usr/share/speech-dispatcher/locale/en/orca-chars.dic': No such file or directory
[Mon May 13 18:51:07 2024 : 934] speechd: Failed to load symbols 'orca-chars.dic' for locale 'en'

tkapias · 2024-05-13T14:29:40Z

I don't have some of those files too, for example gender-neutral.dic is provided for french, spanish and german only by the debian package. It's because those languages have (sadly) new rules to write in a gender-neutral fashion.

So I think that thoses errors should be only INFO or WARN level.

guest271314 · 2024-07-14T19:24:25Z

@tkapias This #866 (comment) is very useful. I'll probably create a gist using your work for instructions. Works on Chromium Version 128.0.6586.0 (Developer Build) (64-bit), does not work on Firefox Nightly 130.0a1, the piper voices are not loaded. Thanks for sharing.

tkapias · 2024-07-15T04:34:33Z

I recently had to reinstall a clean desktop and wrote a new note about piper installation for Debian testing, it works with Firefox 115.12.0esr.

Installation

# Keep the installation files and symlink to the latest version
mkdir -p piper piper/voices & cd piper
wget https://github.com/rhasspy/piper/releases/download/2023.11.14-2/piper_linux_x86_64.tar.gz
tar xvf piper_linux_x86_64.tar.gz
mv piper piper-2023.11.14-2
sudo ln -s /home/tomasz/Forge/Logiciels/piper/piper-2023.11.14-2/piper /usr/local/bin/piper

# Download voices from https://huggingface.co/rhasspy/piper-voices/tree/main
# example with Male/Female for French/English
cd voices/
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/fr/fr_FR/upmc/medium/fr_FR-upmc-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/fr/fr_FR/upmc/medium/fr_FR-upmc-medium.onnx.json
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/amy/medium/en_US-amy-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/amy/medium/en_US-amy-medium.onnx.json
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/hfc_male/medium/en_US-hfc_male-medium.onnx.json
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/hfc_male/medium/en_US-hfc_male-medium.onnx

# Copy voices to system
mkdir -p $HOME/.local/share/piper/voices
sudo cp ./* $HOME/.local/share/piper/voices/

Configuration

sudo apt install
libspeechd2 python3-speechd speech-dispatcher-audio-plugins speech-dispatcher-espeak-ng speech-dispatcher alsa-utils bc

mkdir -p $HOME/.config/speech-dispatcher/modules
cd $HOME/.config/speech-dispatcher/
touch speechd.conf
touch modules/piper.conf

# Past their content to piper.conf and modules/piper.conf from below
# then test output with spd-say
spd-say --language en 'Welcome to the world of speech synthesis!'

speechd.conf

Timeout 30
LogLevel  2
LogDir  "default"

DefaultVolume 100
DefaultVoiceType "MALE1"
DefaultLanguage "en"
DefaultPunctuationMode "some"

SymbolsPreproc "char"
SymbolsPreprocFile "gender-neutral.dic"
SymbolsPreprocFile "font-variants.dic"
SymbolsPreprocFile "symbols.dic"
SymbolsPreprocFile "emojis.dic"
SymbolsPreprocFile "orca.dic"
SymbolsPreprocFile "orca-chars.dic"

DefaultCapLetRecognition  "none"
DefaultSpelling  Off

AudioOutputMethod "pulse"
AudioPulseDevice "default"
AudioPulseMinLength 10

AddModule "piper"                   "sd_generic"   "piper.conf"

DefaultModule piper

LanguageDefaultModule "en"  "piper"
LanguageDefaultModule "fr"  "piper"

Include "clients/*.conf"

modules/piper.conf

Debug 0

GenericExecuteSynth "printf %s \'$DATA\' | piper --length_scale \`echo \'($RATE * -0.01) + 1\' \| bc\` --sentence_silence 0 --model ~/.local/share/piper/voices/$VOICE --output-raw | aplay -r 22050 -f S16_LE -t raw -"
# only use medium quality voices to respect the 22050 rate for aplay in the command above.

GenericCmdDependency "piper"
GenericCmdDependency "aplay"
GenericCmdDependency "printf"
GenericCmdDependency "bc"
GenericSoundIconFolder "/usr/share/sounds/sound-icons/"

GenericPunctNone ""
GenericPunctSome "--punct=\"()<>[]{}\""
GenericPunctMost "--punct=\"()[]{};:\""
GenericPunctAll "--punct"

GenericLanguage  "en" "en_US" "utf-8"
GenericLanguage  "fr" "fr_FR" "utf-8"

AddVoice        "en"    "MALE1"         "en_US-hfc_male-medium.onnx"
AddVoice        "en"    "FEMALE1"       "en_US-amy-medium.onnx"
AddVoice        "fr"    "MALE1"         "fr_FR-upmc-medium.onnx -s 1"
AddVoice        "fr"    "FEMALE1"       "fr_FR-upmc-medium.onnx"

DefaultVoice    "en_US-amy-medium.onnx"

guest271314 · 2024-07-15T04:43:55Z

@tkapias A gist to link to would be useful. This is what I came up with from your original work https://gist.github.com/guest271314/9f09ab899df11e344c568a7b93f544c3.

I'm using the full path to piper.

We can also pipe to /dev/audio

GenericExecuteSynth "printf %s \'$DATA\' | /home/xubuntu/bin/piper/piper --length_scale 1 --sentence_silence 0  --model ~/.local/share/piper/voices/$VOICE --output-raw > /dev/audio"

Symlinking the .onnx and .onnx.json files to ~/.local/share/piper/voices/ doesn't appear to work. I already had those files in a Web extension folder. Now I have the files in two locations on the machine until I figure out a more efficient solution.

I had to restart speech-dispatcher and Chromium a few times when reproducing from scratch.

What is different in your update that results in the code working on Firefox?

tkapias · 2024-07-15T05:07:24Z

@guest271314, if you're okay to maintain it, I will leave comments on your Gist when I have updates.

I will test /dev/audio, thanks.

I don't symlink the voices, only the binary, but I didn't know that it would fail.

About Firefox, I don't know why it makes any difference, the config files are pretty much the same.
I tested Firefox inside the reader view for articles in french and english. Firefox gave me correponding voices.
Maybe the ESR version of Firefox has this better than Nightly.

guest271314 · 2024-07-15T05:14:49Z

@guest271314, if you're okay to maintain it, I will leave comments on your Gist when I have updates.

Sure.

Ovi329 · 2024-07-30T10:12:15Z

My user module config for speechd works fine, I am sharing it below. But I don't understand how to adapt the Rate/Pitch formula, maybe someone will have an idea.

* I am on Debian Testing.

* I installed the binary/amd64 version in `~/.local/opt/piper/` and made a symbolic links in `~/.local/bin/`.

* I download the voice files in `~/.local/share/piper/voices/`

* I always manually kill speechd processes after config modifications.
  
  * I used `spd-conf` to create a default config for my user.
  * I updated `~/.config/speech-dispatcher/speechd.conf`:

Timeout 30                                                                   
LogLevel  2                                                                  
LogDir  "default"                                                            
                                                                             
DefaultVolume 100                                                            
DefaultVoiceType "MALE1"                                                     
DefaultLanguage "en"                                                         
DefaultPunctuationMode "some"                                                
                                                                             
SymbolsPreproc "char"
SymbolsPreprocFile "gender-neutral.dic"
SymbolsPreprocFile "font-variants.dic" 
SymbolsPreprocFile "symbols.dic"   
SymbolsPreprocFile "emojis.dic"    
SymbolsPreprocFile "orca.dic"
SymbolsPreprocFile "orca-chars.dic"

DefaultCapLetRecognition  "none"
DefaultSpelling  Off
                                                                             
AudioOutputMethod "pulse"            
AudioPulseDevice "default"            
AudioPulseMinLength 10 

AddModule "piper"                   "sd_generic"   "piper.conf"

DefaultModule piper                                                          
                                                                             
LanguageDefaultModule "en"  "piper"
LanguageDefaultModule "fr"  "piper"
                                      
Include "clients/*.conf"

* I created the new module using an existing module on the "sd_generic" model (listed in speechd.conf).
  
  * I updated the new module `~/.config/speech-dispatcher/module/piper.conf`:

Debug 0

GenericExecuteSynth "printf %s \'$DATA\' | piper --length_scale 1 --sentence_silence 0 --model ~/.local/share/piper/voices/$VOICE --output-raw | aplay -r 22050 -f S16_LE -t raw -"
# only use medium quality voices to respect the 22050 rate for aplay in the command above.

GenericCmdDependency "piper"
GenericCmdDependency "aplay"
GenericCmdDependency "printf"
GenericSoundIconFolder "/usr/share/sounds/sound-icons/"

GenericPunctNone ""
GenericPunctSome "--punct=\"()<>[]{}\""
GenericPunctMost "--punct=\"()[]{};:\""
GenericPunctAll "--punct"

#GenericStripPunctChars  ""

GenericLanguage  "en" "en_US" "utf-8"
GenericLanguage  "fr" "fr_FR" "utf-8"

AddVoice        "en"    "MALE1"         "en_US-hfc_male-medium.onnx"
AddVoice        "en"    "FEMALE1"       "en_US-amy-medium.onnx"
AddVoice        "fr"    "MALE1"         "fr_FR-upmc-medium.onnx -s 1"
AddVoice        "fr"    "FEMALE1"       "fr_FR-upmc-medium.onnx"

DefaultVoice    "en_US-amy-medium.onnx"

#GenericRateForceInteger 1
#GenericRateAdd 1
#GenericRateMultiply 100

* In the config above `--length_scale 1` could be replaced with `--length_scale $RATE` to manage the voice rate. But, I don't know how to apply the [formula explained in speedchd documentation](https://htmlpreview.github.io/?https://github.com/brailcom/speechd/blob/master/doc/speech-dispatcher.html#Configuration-of-the-Generic-Output-Module).

Firstly, thank you for your guide. I've been trying to set piper with speechd and foliate for quite some days and your post was a savior. But i noticed that the 2-3s pause between sentences only occur in *high.onnx files. Is it possible to get it to working on These high quality .onnx files instead of medium voices without the pause?

snovotill · 2024-10-20T15:21:01Z

The code below CORRECTLY implements both volume control and speech-rate pass-through from speech-dispatcher to Piper.
Therefore if you install the Read Aloud plugin into your browser, then the aforementioned controls will function.
It also handles voices with different sample rates automatically, namely high/medium/low aka 16000Hz vs 22050Hz.
This plays back via PipeWire since all other forms of playback are now obsolete, and aplay is very hacky.
I did not implement voice pitch but included a comment on how to do this in the first file below.

File /etc/speech-dispatcher/modules/piper.conf follows below:

Debug "0"

# A more correct way of passing sample rate to pw-play would be:
#   --rate \$(jq .audio.sample_rate < /opt/piper-tts/voices/\$VOICE.onnx.json)
#   GenericCmdDependency "jq"

# It is possible to implement $PITCH control by:
#   varying --rate to change voice pitch and tempo
#   varying --length_scale to restore correct tempo

# The variables below are single-quoted because they are actually just tokens which will be substituted with constants.
# Therefore it is not possible to use curly braces ${VAR} on them and therefore expr must be used instead!

# Executable configuration:
GenericExecuteSynth "printf %s \'$DATA\' | /opt/piper/piper --sentence_silence 0.1 --model \'/opt/piper/$VOICE\' --length_scale $(echo \'scale=2; 1.33/($RATE + .66)\' | bc) --output_raw | pw-play --volume $(echo \'scale=2; $VOLUME/100\' | bc) --rate $( [ \"$(expr substr \'$VOICE\' $(expr length \'$VOICE\' - 7) 3)\" = low ] && echo 16000 || echo 22050 ) --channel-map LE -"
#GenericExecuteSynth "echo \'$DATA\' | /opt/piper/piper --sentence_silence 0.1 --model \'/opt/piper/$VOICE\' --length_scale $(echo \'scale=2; 1.33/($RATE + .66)\' | bc) --output_raw | aplay -r $( [ \"$(expr substr \'$VOICE\' $(expr length \'$VOICE\' - 7) 3)\" = low ] && echo 16000 || echo 22050 ) -f S16_LE -t raw -"
GenericCmdDependency "/opt/piper/piper"
GenericCmdDependency "pw-play"
GenericCmdDependency "bc"
#GenericCmdDependency "aplay"

# Prevent speech-dispatcher from cutting text into chunks:
#GenericDelimiters "|"  # "|" never occurs
#GenericMaxChunkLength 99999

GenericRateAdd 1
GenericPitchAdd 1
GenericVolumeAdd 1
GenericRateMultiply 1
GenericPitchMultiply 1000

# Sound effect wav files:
GenericSoundIconFolder "/usr/share/sounds/sound-icons/"

# Ensure all characters will be interpreted and spoken:
GenericLanguage "en" "en-US" "utf-8"
GenericLanguage "en-gb" "en-GB" "utf-8"
GenericLanguage "en-us" "en-US" "utf-8"

# Voice file configuration:
VoiceFileDependency "/opt/piper/$VOICE"
AddVoice "en-GB" "MALE1" "en_GB-northern_english_male-medium.onnx"
AddVoice "en-GB" "FEMALE1" "en_GB-southern_english_female-low.onnx"
AddVoice "en_us" "MALE2" "en_US-hfc_male-medium.onnx"
AddVoice "en_us" "FEMALE2" "en_US-hfc_female-medium.onnx"
DefaultVoice "en_GB-northern_english_male-medium.onnx"

File /etc/speech-dispatcher/speechd.conf follows below:

LogLevel 3

LogDir "default"

DefaultRate 0

#DefaultVolume 0
DefaultVolume 66

DefaultLanguage en

#DefaultPunctuationMode "none"

SymbolsPreproc "char"

SymbolsPreprocFile "gender-neutral.dic"
SymbolsPreprocFile "font-variants.dic"
SymbolsPreprocFile "symbols.dic"
SymbolsPreprocFile "emojis.dic"
SymbolsPreprocFile "orca.dic"
SymbolsPreprocFile "orca-chars.dic"

#DefaultCapLetRecognition  "none"

#DefaultSpelling Off

AudioOutputMethod pulse

AddModule "espeak-ng" "sd_espeak-ng" "espeak-ng.conf"

#AddModule "espeak-ng-mbrola-generic" "sd_generic" "espeak-ng-mbrola-generic.conf"
AddModule "piper" "sd_generic" "piper.conf"

# Use Piper when language is not specified:
#DefaultModule espeak-ng
DefaultModule piper

# Use Piper for these specific languages:
LanguageDefaultModule "en" "piper"
LanguageDefaultModule "en-GB" "piper"
LanguageDefaultModule "en-US" "piper"

Include "clients/*.conf"

The Piper binary as well as voice files are all lumped together in the /opt/piper/ directory.

guest271314 · 2024-10-20T15:28:22Z

This https://github.com/guest271314/native-messaging-piper provides complete control over the entire process, without having to fiddle with Speech Dispatcher and the socket connection between the browser.

Want to change pitch, speed, etc, just make use of Web Audio API audio nodes.

KiaraGrouwstra mentioned this issue Nov 6, 2023

integrate piper with speechd rhasspy/piper#265

Open

sthibaul added enhancement help wanted labels Nov 6, 2023

johnfactotum mentioned this issue Nov 18, 2023

Flatpak TTS does not work johnfactotum/foliate#1126

Closed

tkapias mentioned this issue Dec 26, 2023

Add support for Speech Dispatcher (speechd) and other engines wustho/epy#100

Open

KAGEYAM4 mentioned this issue May 24, 2024

Long pauses between sentences Elleo/pied#9

Open

guest271314 mentioned this issue Jul 14, 2024

Generating speech locally in the web browser rhasspy/piper#352

Open

module request: piper #866

module request: piper #866

Comments

KiaraGrouwstra commented Nov 6, 2023 • edited Loading

csukuangfj commented Nov 6, 2023

csukuangfj commented Nov 6, 2023

Elleo commented Nov 6, 2023

coderalpha commented Nov 8, 2023 • edited Loading

sthibaul commented Nov 8, 2023 • edited Loading

sthibaul commented Nov 8, 2023

coderalpha commented Nov 8, 2023

coderalpha commented Nov 9, 2023 • edited Loading

sthibaul commented Nov 11, 2023

coderalpha commented Nov 13, 2023

sthibaul commented Nov 13, 2023

coderalpha commented Nov 13, 2023 • edited Loading

sthibaul commented Nov 13, 2023

murlakatamenka commented Nov 13, 2023

coderalpha commented Nov 13, 2023

sthibaul commented Nov 13, 2023

coderalpha commented Nov 13, 2023

sthibaul commented Nov 13, 2023

coderalpha commented Nov 13, 2023 • edited Loading

sthibaul commented Nov 13, 2023 • edited Loading

coderalpha commented Nov 13, 2023 • edited Loading

sthibaul commented Nov 13, 2023

coderalpha commented Nov 13, 2023

sthibaul commented Nov 13, 2023

coderalpha commented Nov 13, 2023

sthibaul commented Nov 13, 2023

coderalpha commented Nov 13, 2023

Elleo commented Nov 22, 2023 • edited Loading

murlakatamenka commented Nov 22, 2023

Elleo commented Nov 22, 2023

Elleo commented Nov 22, 2023

carlocastoldi commented Dec 22, 2023

sthibaul commented Dec 23, 2023

tkapias commented Dec 25, 2023 • edited Loading

tkapias commented Dec 25, 2023 • edited Loading

omega3 commented Jan 28, 2024

tkapias commented Jan 29, 2024

RoyalOughtness commented Mar 2, 2024

omega3 commented Mar 3, 2024

Elleo commented Mar 4, 2024

sthibaul commented Mar 5, 2024

KAGEYAM4 commented May 13, 2024 • edited Loading

tkapias commented May 13, 2024

KAGEYAM4 commented May 13, 2024

tkapias commented May 13, 2024

guest271314 commented Jul 14, 2024

tkapias commented Jul 15, 2024 • edited Loading

Installation

Configuration

speechd.conf

modules/piper.conf

guest271314 commented Jul 15, 2024

tkapias commented Jul 15, 2024

guest271314 commented Jul 15, 2024

Ovi329 commented Jul 30, 2024 • edited Loading

snovotill commented Oct 20, 2024 • edited Loading

guest271314 commented Oct 20, 2024

KiaraGrouwstra commented Nov 6, 2023 •

edited

Loading

coderalpha commented Nov 8, 2023 •

edited

Loading

sthibaul commented Nov 8, 2023 •

edited

Loading

coderalpha commented Nov 9, 2023 •

edited

Loading

coderalpha commented Nov 13, 2023 •

edited

Loading

coderalpha commented Nov 13, 2023 •

edited

Loading

sthibaul commented Nov 13, 2023 •

edited

Loading

coderalpha commented Nov 13, 2023 •

edited

Loading

Elleo commented Nov 22, 2023 •

edited

Loading

tkapias commented Dec 25, 2023 •

edited

Loading

tkapias commented Dec 25, 2023 •

edited

Loading

KAGEYAM4 commented May 13, 2024 •

edited

Loading

tkapias commented Jul 15, 2024 •

edited

Loading

Ovi329 commented Jul 30, 2024 •

edited

Loading

snovotill commented Oct 20, 2024 •

edited

Loading