Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprocessors do not have access to the user_id and sentence #9

Closed
kdavis-mozilla opened this issue Dec 11, 2018 · 0 comments
Closed
Assignees

Comments

@kdavis-mozilla
Copy link
Contributor

When the community gets access to the alpha version of the data set there may be sentences of the form

"I am in room 2049."

This may be validly spoken in several ways:

"I am in room two thousand forty nine"
"I am in room twenty forty nine."
"I am in room two zero four nine."

Currently, a preprocessor, with its current API, would only have access to the sentence:

"I am in room 2049."

and be tasked with, among other things, converting the digits into words.

In this case, the text alone is insufficient to disambiguate these cases and find the correct way to convert these digits to words. So the current preprocessor API can't handle this case. Only in listening to the audio can one correctly disambiguate these cases.

Now each audio clip is uniquely identified by the pairing of a user_id who spoke the sentence and the sentence itself. So if passed the user_id and sentence pairing, a preprocessor can know that the sentence corresponds to a particular audio clip.

Then the creator of the preprocessor can actually listen to that clip and correctly convert the sentence

"I am in room 2049."

to the actual audio spoken in the clip:

"I am in room two thousand forty nine"
"I am in room twenty forty nine."
"I am in room two zero four nine."
...

So the locale specific preprocessor API should change to accept the pair of parameters user_id and sentence.

@kdavis-mozilla kdavis-mozilla self-assigned this Dec 11, 2018
kdavis-mozilla added a commit that referenced this issue Dec 12, 2018
Fixed #9 (Preprocessors do not have access to the user_id and sentence)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant