EchoVideo lets you add closed captions, transcripts, or both to any media!

Transcripts (also referred to as ASR or automatic speech recognition) and Closed Captions are both text versions of the speech from video or audio media. They are both designed to allow viewers to hear and read the spoken words. Because the two are similar, there has been some confusion about which is which, how to access them, or what their capabilities are.

ASR Transcriptions in EchoVideo do not override third-party Closed Captions.

Closed captions in EchoVideo work just like those in most other video players. The text of what is being spoken is shown on the screen, along with the video, in small chunks that correspond to the current video location.

Transcripts, like closed captions, provide the text of what is being spoken on the screen and are divided into small chunks that correspond to the video's location. The difference in transcripts is that the entire transcription is shown, with the entry corresponding to the current location highlighted. Transcripts are also searchable within the player, as the entire transcript is visible. Users can jump to a specific location in the transcript, which automatically navigates to that location in the video.

The following figures both show the transcript button and transcript panel for a video, as well as the closed caption button and caption banner, and both show the same text for the current location in the video. The text is often the same, but can differ because closed captions are typically (and by design) more accurate. As noted later on this page, however, transcripts can be applied as closed captions to media, in which case the text in both would be identical.

The figure below shows the player with both transcripts and closed captions identified. Media Player with both transcripts open and closed captions turned on and visible with icons for showing each identified as described

The following figure shows the legacy classroom with both transcripts and closed captions.

Classroom player with both transcripts open and closed captions turned on and visible with icons for showing each identified as described

Each of those chunks in both closed captions and transcripts is called a cue and has start and end timestamps indicating the location in the video where it occurs. Those timestamps tell the player when to display or highlight each particular chunk. This is why closed captions (and transcripts) can get out of sync when the video is edited to remove sections from the middle or trim the ends. In that situation, the video location time changes with those edits, but the closed caption / transcript file does not know the change.

Open and View Closed Captions

Closed captions, unlike Transcripts, are configurable, so you can display them in a way that works best for you. You can adjust the size, location (top, middle, or bottom), and contrast (light text on a dark banner; dark text on a light banner).

Click the Settings icon, then select Subtitles / CC to view the configuration options shown below. media player with settings menu open and subtitles option selected with caption viewing options showing as described

After the subtitles or CC setup is configured, click the CC button at the bottom of the playback screen to turn them on or off. media player with cc button identified and closed captions shown on the screen as described

For all areas except the legacy classroom (discussed below), the closed captioning configuration you set remains in effect for all playback. This means you should only have to configure it once. Then, when you click the CC button for any media, the closed captions will appear as you set them. You can always click the settings icon and make a change.

Since configuration persists across all views in the newer player, you can configure your CC options separately from turning them on for viewing. If the media does not have closed captions applied, the CC button is grayed out and unclickable. However, the configuration options are still available.

Closed Captions in the Legacy Classroom

Configuring and turning on closed captions in the legacy classroom player differs slightly from the newer player described above. To begin with, your closed captioning preferences are not retained, so you need to configure them for each class you view if you do not want the default settings.

In the legacy classroom, all options are contained in the CC button of the player, as shown in the figure below.

legacy classroom player with cc button selected and closed caption viewing options showing for configuration as described

First, click the CC button, then turn on closed captions using the CC toggle at the top of the pop-up box. After that, you can make changes to the caption banner, including text size, different contrast options, location on the screen, and alignment of the text.

Once turned on, the closed captions will appear until you click the CC button again and toggle them off. Remember, this configuration persists only in this classroom. You will have to turn on CC and configure it for other class viewing.

If the CC button is grayed out and unclickable in the legacy classroom, it means this media does not have closed captions applied. Since configuration does not persist to other media playback, there is no reason to present you with those options.

Open and View Transcripts

Transcripts, unlike closed captions, cannot be configured for viewing. When you select to view transcripts for media, the full transcript opens in the transcript panel. The transcript panel appears to the right of the player by default. If you select full-width mode (the button with two arrows), the transcript panel moves below the player.

Click the Transcript button, as shown in the figures below. As the media plays, the transcript location corresponding to the playback location is highlighted.

The transcript button for the newer player is located with the other tools at the bottom of the playback panel, as shown in the figure below. media player with transcript panel open and transcript button identified as described

The transcript button for the legacy classroom player is located with the other classroom tools in the top-right corner of the classroom window, as shown in the figure below.

Classroom player with transcript panel open and transcript button identified as described

Notice that at the top of the Transcript panel, there is a search box. Entering text into the search box immediately searches for matching terms in the transcript. The number of matches changes as you type. The count of matching terms and previous / next arrow buttons appear below the search box and are identified in the figure below.

The matches in the transcript are underlined, allowing you to either use the previous / next buttons to find them or scroll through the transcript to find them. A term search and two matches are shown in the figure below. media player with transcript search text entered and results count and navigation arrows identified showing search results in the transcript as described

Click on any cue in the transcript and notice that the video playback location changes to match the location of the selected cue. This allows you to find the media you are looking for.

Transcripts can also be downloaded. This provides a local version of the text file of the speech transcription. This may be helpful as a study aid, allowing you to review the text offline or copy and paste portions of it into your notes.

Click the Download icon located below the Transcripts panel. The download options include, at a minimum, a Transcript tab, which lets you download the VTT file or a TXT file of the transcripts.

In the legacy classroom, click the download icon located to the left of the search box.

Any downloaded transcript can be opened in any text editor, although Windows Notepad may sometimes ignore line breaks. Suggest WordPad instead if using native Windows text editing programs.

In Interactive Media (videos with embedded polls), students who open the transcript will only see the portion up to any unanswered polls. While the transcript panel will open, the transcript text is gated, just like the media, by the polling questions. Furthermore, students cannot download the transcript until they have responded to all the embedded polls. The download link appears only after all polls have been answered.

Use Transcripts for Closed Captions

Closed captions are generally preferred when viewers want to read the speech text alongside the video, as caption placement can be customized to minimize disruption to the viewing experience. Transcripts, because they open in a side panel, require the viewer to change focus to view the media and read the text.

Closed captions are also designed to be more accurate, and the accuracy level is determined by the contract agreed upon between your institution and the closed captions provider.

But what if your institution does not automatically provide closed captions?

Both closed captions and transcripts use WebVTT files. These files are in a standardized format, which makes them effectively interchangeable. EchoVideo allows you to download the transcript file for a video and then upload that same .VTT file as a closed caption file.

Furthermore, the EchoVideo transcript editor includes an Apply to CC button that allows someone editing a transcript for accuracy to save their changes and apply the improved transcript to the media as closed captions. The transcript does not have to be edited to be applied as a captions file, but it is often recommended, so that you know the text for the speech is as accurate as possible.

Details About the Differences

This section provides more detailed information about transcripts and closed captions for those readers who want to know more.

What is the difference between transcripts and closed captions?

If asked for a quick answer, the response to that question is that transcripts are machine-generated and closed captions are human-generated. Historically, yes. However, given the increased ability of machines to accurately transcribe speech and the costs involved with strictly human-generated captioning, that is not always true.

That being said, closed captions are often more accurate than transcriptions because the purpose of closed captions is to provide reasonable and accurate accommodation for hearing-impaired users, which means that by definition, closed captions are supposed to be as accurate as possible. Institutions contract with closed captioning service companies to provide accurate (and sometimes very fast) closed caption tracks for their media. The level of accuracy and the turnaround time also determine the cost of the service.

The truth is that many non-hearing-impaired people now use closed captions to aid in learning and understanding, as reading and hearing the material can improve comprehension and retention. It also allows users to review material in very noisy or quiet places, where listening to the audio track is impractical.

So why bother with transcripts?

Transcription services are usually cheaper than closed captioning services. This is because closed captions, even if machine-generated, still usually require some level of human intervention to ensure that the accuracy promised by the captioning contract is met. Transcription services are essentially one-time deals. The audio of the media is submitted to the transcript provider, and the transcript is generated and returned for the media.

In addition, even though transcripts are entirely machine-generated, most transcription service providers tout accuracy levels of 95% and higher. Though that accuracy claim may be language-specific, and depends heavily on the quality of the audio and the clarity of the speaker.

EchoVideo includes transcription services as part of the EchoVideo package, meaning they are not billed separately (at least for a base set of hours) and may eliminate the need to contract with closed captioning providers. If the transcriptions meet the students' needs and the institution's legal requirements for accommodations, they may be sufficient.

EchoVideo also features a Transcript Editor that lets users review and manually edit the transcript. This can allow institutions to transfer the cost of closed caption providers to student work-study programs, for example, employing student workers as transcription editors.

Furthermore, EchoVideo provides language selection or automatic language detection for transcripts, allowing them to be generated in the primary language of the speech. This feature is not viable for mixed-language media speech tracks. Transcriptions are provided in a single language only. Mixed language speech tracks do require human editing of the transcript to provide an entirely accurate transcription of the speech.

So why bother with closed captions?

Closed captioning providers provide a very specific service. They transcribe speech (and other audio cues) for media quickly and accurately as required by the contract. This means that the closed captioning provider is responsible for getting it right and doing so as quickly as they are obligated to. And institutions pay them for that service. If the accuracy of automated transcriptions is insufficient, using a closed captioning provider may be required. Or an institution can assign human editing of transcriptions to make them as accurate as possible.

Closed captions are also automatically applied to the media as a visual track that appears, in segments, on the screen, along with the accompanying video. Both Transcripts and Closed Captions are time-synced with the media, but transcripts, as shown above, appear as a single panel, while closed captions appear with the speech or other audio cue they represent. This helps reinforce learning because the user is not distracted by other text on the screen.

Transcripts can be added to media as a closed caption track, which takes the text and timing cues from the transcript and displays them with the media as closed captions. But again, the difference between the two often comes down to accuracy.

Furthermore, closed captioning providers are likely required to provide captions for mixed-language speech tracks in the language being spoken, making closed captioning a better option for institutions that often feature mixed-language lectures. Again, however, this feature may be contract-specific and is determined by the institution's requirements for the closed captioning provider.

Related to