When you select to Edit a transcript in the transcript editor, you may notice a Word Confidence slider above the transcript text pane as shown in the below figure. The confidence slider appears for any transcript generated by the ASR automatic transcription service (unless the transcript has been previously edited, as described later on this page).
What Is a Confidence Score
In machine-generated transcriptions, each word is given a confidence score based on how certain the machine program determines it is in selecting the correct word. Lower confidence often indicates speech that was garbled, difficult to understand, or at a lower volume, or that was not interpreted correctly by the automated program for other reasons. The scores are presented as percentages and thus range from 0 to 100.
Words are underlined based on the slider's configured confidence value. The value shown by default is an average confidence score across the entire transcript. Underlined words are those whose scores fall below the indicated confidence score.
Change the Confidence Score Applied
Click and drag the Confidence slider or type a different value in the box to change the confidence level used to underline words in the transcript. For example, if the speaker had a thick foreign accent, you may want to lower the confidence percentage to reduce the number of underlined entries. This lets you target the words most likely to have been incorrectly transcribed.
Lowering the confidence slider slightly, particularly for foreign speakers, is a good way to identify terms that are repeatedly mistranscribed and to use the Search/Replace functionality to fix them. Then raise the slider to find the remaining words and review them individually.
Note: Word Confidence is not the same as accuracy. While we wish machines could provide 100% accurate transcripts, at this time, the best they can do is indicate how confident the algorithm is in its output. Word Confidence is a valuable tool for the initial review of potentially inaccurate words. Still, it should not be the only indicator and is not a substitute for reading the entire transcript and confirming its accuracy against the media's audio.
Do All Transcripts Have Confidence Scores
All transcripts returned by the ASR transcription service will include a confidence score for each word. These are contained in the NOTE CONF entries of the transcript file (see Defining sections of a WebVTT file in Editing Transcriptions Outside of EchoVideo for more information if you are interested). If you upload an automated transcript that you may have edited offline previously, and the file still contains the NOTE CONF fields, the confidence ratings will be used for the Transcript Editor interface.
If a transcript file is uploaded that does not contain the NOTE CONF fields, the Word Confidence slider is grayed out, and the value shown in the box is 100%. Essentially, the editor assumes that if there is no confidence score, the transcript was generated by a human and is therefore correct.
When a user makes edits, the confidence scores are cleared for the entire cue (a single line of words) where any word has been edited. This means that if you edit a single word in a cue where multiple words are underlined for being below the confidence score slider threshold when you move to a different cue, the underlines in the entire cue are removed because the confidence scores are cleared.
As indicated above, no confidence score implies 100% because human intervention is assumed. In this case, it is assumed that the entire cue was reviewed alongside the change, and thus the cue is 100% correct. While the interface will show that the scoring has been removed for each edited cue, you still have to save the edits to a new version, at which point the scores are removed from the transcription.