-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[text-to-speech] Spaces in text are encoded as + #635
Comments
Actually, it seems the 400 - Bad Request issue is only caused when there is a semicolon followed by a unicode character. curl also receives a BadRequest using this:
However a POST request works successfully:
Semi-colons are a reserved character in URLs, escaping them works as expected (both with the java-sdk, and curl). I'm not sure why the presence of semi-colons causes the backend to decode the rest of the text as ASCII. I suppose switching to POST requests would be a suitable workaround, or modifying |
Similar issue reported to IBM customer support: The IBM Java Bluemix SDK version 3.7.1 includes a jar file called “text-to-speech-3.7.1.jar” ultimately makes the following encode call: The URLEncoder.encode(content,”UTF-8”) is adding the “plus” signs to the text. |
Steps to reproduce:
TSS.synthesize("Hello Bob", Voice.EN_LISA
);Expected behavior
Audio for "Hello Bob"
Actual behavior
Audio for "Hello+Bob"
JDK version: OpenJDK 1.8.0_121
java-sdk version: 3.7.1
With commit 7d9bbd7, to resolve #602, the text is now url-encoded before being passed off to okhttp.
Unfortunately,
RequestUtils.encode
callsURLEncoder.encode()
, which performs form-encoding instead of %-encoding. okhttp then does proper %-encoding, which results in requests for synthesizing "Hi Bob" becoming:Issue #602 seems to be caused by the
é
character being encoded as UTF-8 by okhttp (0xC3 0xA9
) but decoded as ASCII by the backend, hence theBadRequestException: 'ascii' codec can't decode byte 0xc3
error.The text was updated successfully, but these errors were encountered: