Preprocessors do not have access to the user_id and sentence #9

kdavis-mozilla · 2018-12-11T18:00:10Z

When the community gets access to the alpha version of the data set there may be sentences of the form

"I am in room 2049."

This may be validly spoken in several ways:

"I am in room two thousand forty nine"
"I am in room twenty forty nine."
"I am in room two zero four nine."

Currently, a preprocessor, with its current API, would only have access to the sentence:

"I am in room 2049."

and be tasked with, among other things, converting the digits into words.

In this case, the text alone is insufficient to disambiguate these cases and find the correct way to convert these digits to words. So the current preprocessor API can't handle this case. Only in listening to the audio can one correctly disambiguate these cases.

Now each audio clip is uniquely identified by the pairing of a user_id who spoke the sentence and the sentence itself. So if passed the user_id and sentence pairing, a preprocessor can know that the sentence corresponds to a particular audio clip.

Then the creator of the preprocessor can actually listen to that clip and correctly convert the sentence

"I am in room 2049."

to the actual audio spoken in the clip:

"I am in room two thousand forty nine"
"I am in room twenty forty nine."
"I am in room two zero four nine."
...

So the locale specific preprocessor API should change to accept the pair of parameters user_id and sentence.

The text was updated successfully, but these errors were encountered:

Fixed #9 (Preprocessors do not have access to the user_id and sentence)

kdavis-mozilla self-assigned this Dec 11, 2018

kdavis-mozilla mentioned this issue Dec 12, 2018

Fixed #9 (Preprocessors do not have access to the user_id and sentence) #10

Merged

kdavis-mozilla closed this as completed in 567dade Dec 12, 2018

kdavis-mozilla added a commit that referenced this issue Dec 12, 2018

Merge pull request #10 from mozilla/issue9

6f02488

Fixed #9 (Preprocessors do not have access to the user_id and sentence)

kdavis-mozilla mentioned this issue Dec 12, 2018

Numbers and abbreviations #11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocessors do not have access to the user_id and sentence #9

Preprocessors do not have access to the user_id and sentence #9

kdavis-mozilla commented Dec 11, 2018

Preprocessors do not have access to the user_id and sentence #9

Preprocessors do not have access to the user_id and sentence #9

Comments

kdavis-mozilla commented Dec 11, 2018