Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to get character/word level confidence scores #2021

Open
remyzerems opened this issue Nov 15, 2021 · 2 comments
Open

Ability to get character/word level confidence scores #2021

remyzerems opened this issue Nov 15, 2021 · 2 comments
Labels
enhancement New feature or request

Comments

@remyzerems
Copy link

remyzerems commented Nov 15, 2021

When using *WithMetadata functions, it would be helpful to get access to the confidence score of each token of a given candidate transcript.

It could be made available in the TokenMetadata class as a public member confidence.

@reuben reuben added the enhancement New feature or request label Nov 16, 2021
@reuben reuben changed the title Ability to get word confidence scores Ability to get character/word level confidence scores Nov 16, 2021
@reuben
Copy link
Collaborator

reuben commented Nov 16, 2021

The main decoder loop is here: https://github.com/coqui-ai/STT/blob/main/native_client/ctcdecode/ctc_beam_search_decoder.cpp

The trie data structure used to keep individual tokens is here: https://github.com/coqui-ai/STT/blob/main/native_client/ctcdecode/path_trie.h

The log_prob_c member contains log-probability for the current character. At decode time, only the accumulated score from the beginning of the transcript until the current node (the score member) is copied into the Output structure:

output.confidence = scores[prefixes_copy[i]];

This Output structure is then converted into the public facing Metadata/CandidateTranscript/TokenMetadata here:

ModelState::decode_metadata(const DecoderState& state,

Basically to do this one would have to write a bunch of boring code shuffling this data through the layers of the implementation, so it can be used at the API level.

@reuben
Copy link
Collaborator

reuben commented Nov 16, 2021

@juliandarley ^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants