How does the decoder with scorer always predict in vocab words? #2326

erksch · 2022-12-27T08:54:48Z

erksch
Dec 27, 2022

Hey guys!

From experience with your models, I know that with an external scorer enabled (NGram LM) the model will never predict anything that is not in the vocabulary of the LM. But I was wondering how this actually works because I could not quite find that mechanism in the code.

make_ngram will convert a prefix into a list of words:

STT/native_client/ctcdecode/scorer.cpp

Lines 369 to 396 in bb75afb

    
           vector<string> 
        
           Scorer::make_ngram(PathTrie* prefix) 
        
           { 
        
             vector<string> ngram; 
        
             PathTrie* current_node = prefix; 
        
             PathTrie* new_node = nullptr; 
        
             for (int order = 0; order < max_order_; order++) { 
        
               if (!current_node || current_node->character == -1) { 
        
                 break; 
        
               } 
        
               vector<unsigned int> prefix_vec; 
        
               if (is_utf8_mode_) { 
        
                 new_node = current_node->get_prev_grapheme(prefix_vec, alphabet_); 
        
               } else { 
        
                 new_node = current_node->get_prev_word(prefix_vec, alphabet_); 
        
               } 
        
               current_node = new_node->parent; 
        
               // reconstruct word 
        
               string word = alphabet_.Decode(prefix_vec); 
        
               ngram.push_back(word); 
        
             } 
        
             reverse(ngram.begin(), ngram.end()); 
        
             return ngram; 
        
           }

The scorer will give a harsh OOV_SCORE of -1000.0 whenever a word is not in the language model vocab:

STT/native_client/ctcdecode/scorer.cpp

Lines 329 to 331 in bb75afb

    
           if (word_index == lm::kUNK) { 
        
             return OOV_SCORE; 
        
           }

But what happens if a character of a word is silent and can not really be predicted by the acoustic model.
This character would have a very low probability and is possibly pruned in the decoding step.

If the character is missing, how can the NGram LM still reconstruct a full word and not always give OOV and discard the prefix?

Or is it the job of the acoustic model to also output characters that are silent?

Coqui with LM scorer seems to behave like "find the closest valid hypothesis with full words in vocab" but how is this behaviour enforced?

reuben · 2024-01-24T14:31:49Z

reuben
Jan 24, 2024
Maintainer

You got it, it's not properly enforced, just heavily downscored. If you have lax enough pruning parameters then the decoder will still explore low prob AM labels which will then get boosted by LM scores of valid words.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does the decoder with scorer always predict in vocab words? #2326

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

How does the decoder with scorer always predict in vocab words? #2326

erksch Dec 27, 2022

Replies: 1 comment

reuben Jan 24, 2024 Maintainer

erksch
Dec 27, 2022

reuben
Jan 24, 2024
Maintainer