Extending ctc decoder to output word and character level confidence #1671

JRMeyer · 2021-03-08T08:27:12Z

JRMeyer
Mar 8, 2021
Maintainer

>>> KieranGill
[December 11, 2020, 8:08pm]

I want to clarify I understand the DeepSpeech source correctly. My
objective is to modify the source in order to get word and character
level confidence.

The candidate transcript responses only seem to return with confidence
at the transcript level. This
decode
function seems to only aggregate scores instead of including
word/character level confidence. However, I am having a little trouble
understanding the scope of a prefix. Is a
prefix
meant to be synonymous with a candidate transcript, or is its scope
supposed to be just a few words?

Also, the LM's scorer
function
scores max_order number words at a time, correct? So in order to get
the word-level confidence when using an LM, max_order would have to be
set to 1, right?

[This is an archived TTS discussion thread from discourse.mozilla.org/t/extending-ctc-decoder-to-output-word-and-character-level-confidence]

JRMeyer · 2021-03-08T08:27:14Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> hawa
[December 14, 2020, 12:57pm]

max_order is a param when training the LMs. For most of the western
language, words are separated with white spaces, the max_order is
how many words you want to look ahead when calculating the conditional
probabilities P(Wk slash |Wk-1, Wk-2... Wk-max_order-1). e.g. a bi-gram
model is a LM with max_order of 2, a tri-gram model is a max_order of 3
and so on.

The prefixes are a collection of all possible outputs at specific time
steps, it is ranked by the probability (over all possible combination to
identical outputs) and this will get pruned at the end of each round,
only the best beam_size paths will be kept.

And a word level confidence is already included in the prefixes scoring
you're using your model with a language model (the scorer), that's what
lm_alpha and lm_beta for

[Archived Post]

0 replies

JRMeyer · 2021-03-08T08:27:17Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> KieranGill
[December 15, 2020, 2:28am]

Thank you for the clarification!

I'm still a little confused when you say:

> And a word level confidence is already included in the prefixes
> scoring you're using your model with a language model (the scorer),
> that's what lm_alpha and lm_beta for

It seems that in this
function,
lm_alpha/beta are used to weight the final prefix-level score, but
do not produce word-level scores, correct?

score = ext_scorer_->get_log_cond_prob(ngram, bos) DEEPSPEECH.cdx deepspeech.commands DEEPSPEECH.pages DEEPSPEECH.warc.gz discourse.mozilla.org html-to-markdown.sh shell-conver-html-to-split-posts.sh sorted-deepspeech-posts ext_scorer_->alpha;
score += ext_scorer_->beta;
scores[prefix] += score;

My question, however, is about extracting the word-level score for each
word in a candidate transcript. For example:

Transcript1: The quick brown fox jumped over the river.
Word-level confidence: [the, 99%], [quick, 89%], ... [jumped, 84%], ...

Transcript2: The quick brown fox slumped over the river.
Word-level confidence: [the, 99%], [quick, 89%], ... [slumped, 54%], ...

[Archived Post]

0 replies

JRMeyer · 2021-03-08T08:27:20Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> hawa
[December 15, 2020, 8:10am]

Yes, the language model is giving score in prefix-level. It is giving
scores in the entire decoding process which leads the beam search
results more accurate. The word confidence I mentioned above is the
uni-gram probabilities, it's not the same thing you're asking. It is
when no higher gram is found, it uses the probability of the uni-gram.
Without the smoothing in language model, you can consider it to be the
term frequency of that word for easy understandings.

How about making the decode function to return K best results, and do a
post-processing on those candidates to get the probability of each word
?

I'm not sure If modifying the decoder is the right track, but from my
understanding, I wouldn't do that

[Archived Post]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extending ctc decoder to output word and character level confidence #1671

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Extending ctc decoder to output word and character level confidence #1671

JRMeyer Mar 8, 2021 Maintainer

Replies: 3 comments

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author