EchoVideo has partnered with Amazon Web Services to provide transcription services for audio and video media in EchoVideo. ASR stands for Automatic Speech Recognition and uses computers to convert speech into text and synchronize it with the video.

If your institution prefers to do manual transcriptions and not use the ASR service, you can still use the EchoVideo Transcript Editor to upload transcripts created outside of EchoVideo, then perform further edits and apply transcripts as closed captions if appropriate.

There are several institution-level ASR features that you can set, all available from the Institution Settings > Features page:

ASR Course Media - sends all video / audio / interactive media for transcriptions at the time they are posted into a class in a section (if the media does not already have a transcription).
ASR All New Media - sends all video / audio media for transcription when it is added to EchoVideo (and is successfully processed). This includes user file uploads, EchoVideo capture appliance captures, and all universal capture uploads. It also applies to all users and includes media that students add.
ASR Language Settings - Allows you to select the primary language used in your media recordings, and instructs the transcription service to use that language for transcribing the speech.
Custom Dictionaries - EchoVideo supports custom transcription dictionaries that improve ASR accuracy for organization-specific terminology, acronyms, and proper nouns formatted for AWS Transcribe compatibility. These dictionaries must be configured by Echo360 Support on behalf of institutions.
Automatic Push to Closed Captions - Allows you to set a confidence score for your automated transcripts, and tell EchoVideo to automatically apply those ASR transcripts to the closed captioning track if the transcript meets or exceeds the level you set.

If you choose not to enable the Course Media or All Media ASR options for your institution, you can still request transcripts for individual pieces of media. This allows you to use the automated transcription service without having to transcribe media at scale.

These toggles and any references to ASR in this document or in the EchoVideo documentation refer to automatic, third-party-generated transcriptions of media.

Once the media has an ASR transcription, it is not retranscribed unless the media has been edited (the audio file has changed in any way). This means that if both toggles are turned on and a video is transcribed upon upload, it will not be retranscribed when published. It retains the original transcription. This also applies to individual media transcription requests: if the media has an unedited automated transcript, it will not be sent for re-transcription.

Turning off any of these toggles has no effect on existing transcriptions. Transcriptions remain with the media throughout their lifecycle, regardless of how they were generated.

Manual transcription and upload for video / audio media are always available, as is the Transcript Editor. You may also want to refer to the student-access features for transcripts: Allow Students to Edit Transcripts for Class-Published Media and / or Allow Students to Edit Transcripts for Library Media. Some institutions assign media transcription and editing as work-study tasks, allowing students to take them on.

The following sections provide more technical details about the transcription service.

What Is the Difference Between Transcriptions and Closed Captions?

For end users (e.g., instructors or students), see Transcriptions vs. Closed Captions to see how each is presented in media players.

Transcription differs from captioning in several ways. Transcription is the process of converting speech to text and does not include sound effects and non-speech elements that are often included in closed captions. Furthermore, an automatic speech recognition transcription service is unlikely to meet the accuracy levels required for closed captions for hearing-impaired individuals. You will find this particularly true in the case of low-volume captures, those where the audio is interfered with by background noise, or even in the case of non-native speakers whose accent is thick enough to cause word-recognition problems for the transcription service.

All that being said, transcriptions can be applied much faster than closed captions and typically cost less to generate. Transcriptions may provide a suitable solution for delivering both visual and audio content for lectures. In particular, transcriptions may serve as an interim visual-text measure during the period when closed captions are being generated and applied (sometimes a day or more, depending on the required accuracy levels).

Alternatively, because both closed caption files and transcription files use the WEBVTT standard, it is possible to generate automatic transcriptions, edit them for accuracy, and then apply them to the media as closed captions. Alternatively, you can tell EchoVideo to automatically push transcripts to the closed captioning track if the transcription service applies a high enough confidence score to the transcripts.

When Do Transcriptions Get Applied?

When a piece of video / audio media requires transcriptions, the request and application depend on the configuration of the feature toggles noted above. It can happen either when media is added to the system (ASR All New Media) or when each media item is posted to a class in a course (ASR Course Media).

Understand that for timing, even when requested for individual media (by an admin), transcripts take at least 30 minutes to be returned. More if there are many requests at once, and logically, longer recordings take longer to transcribe.

ASR Course Media

Turning on the ASR Course Media toggle instructs EchoVideo to send media for transcription when it is posted to a class within a course. In the case of capture schedules that auto-publish to a section, or an ad hoc recording published directly to the course, the media is created and published together. For media uploaded to a user's library, or for capture schedules or ad hoc recordings that do not auto-publish to a section, that media is not sent for transcripts until it is posted to an EchoVideo course.

For example, if an instructor creates an ad hoc capture and selects Library as the Publish-to location, that video will not be transcribed until it is published to a class within a course. If an instructor uploads a video directly to the Class List in a section, the video is sent for automatic transcription, and the results are displayed when transcription is complete.

This feature can also be turned on for the whole institution, or it can be set so that Organizations, Departments, and individual course sections can have it on or off as needed

Also, keep in mind that the availability of a capture is not the same as its publication. You can publish a video while keeping it unavailable to students. Since the video is still published, it will trigger an automated transcription at that time (if ASR Course Media is turned on).

If a video is edited while it is currently published, it will be sent for transcription again after the edits are complete and the user clicks Save. This means the video may be transcribed twice (once for the initial posting to the class and once after editing, when it is published).

The connection to make between the above two points is that if you / your instructors' standard procedure is to generate a capture, auto-publish it to a section, but not make it available to students for some period while the instructor edits the video, that video will be transcribed twice; once when it is initially published, then again after the edits are complete and saved.

ASR All Media

Turning on the ASR All New Media toggle instructs EchoVideo to send media for transcription as soon as it is added and to complete media processing. This applies to all video / audio media added to EchoVideo by any user, including students, in any way. The MP3 file created during media processing is sent for transcriptions as soon as it is available.

This also means that as an Admin, you can select to reprocess captures, which re-generates the MP3 file. If this feature toggle is enabled, the regenerated MP3 files are sent for transcription. This is one way to obtain transcriptions for older items or previously unpublished media that may require them.

Transcription of all media also applies to edited media. For example, if an instructor edits a video that already has a transcript, clicks Save, and the edited video is processed, the newly generated MP3 file is then sent for transcription. This is because the audio file changed due to the edits and needs to be retranscribed. This applies even if the edits include removing a silent section of the video; the audio file itself has changed, so EchoVideo sends the new file for transcription. In the Transcript Editor, the video also has at least two transcript versions: the original and Version 1, which corresponds to the edited version.

If a Manual Transcription is applied to a capture / video before it is published, and then the video is published, it will not be sent for automatic transcription. In this case, the uploaded transcription is considered the original, and the automated one is an update. Reverting to the original would restore the transcription uploaded earlier. Although we do not expect this to be a common use case, we wanted to note it here for your reference.

How Long Does It Take for a Transcription to Appear

It takes at least 30 minutes for a video to receive automatic transcription, longer for videos over an hour, and / or when the transcription service is processing a large number of requests at the same time.

Currently, transcriptions are not visible in the media details playback. If you need to know whether or not an item has transcripts, refer to the Transcript entry in the Details tab of the Media Details page. The Transcript entry will read Add if there is no transcript, and will read Update if there is one.

Alternatively, as an Admin, select Edit Transcript from the chevron menu for any completed capture entry; the Transcript editor opens and displays a message if the item does not have a transcript file.

In What Instances Are Videos Not Automatically Transcribed?

Transcriptions are not back-applied to existing captures. Captures and videos already in the system when the ASR Toggle is turned on must be either published / republished or reprocessed (depending on the ASR Toggle's settings), or sent individually to request transcripts.

If you remove and then republish a capture to a class to obtain transcriptions, you will remove student video-view data from the section analytics for that class/video. A better option is to have instructors create a holding class solely for temporarily publishing videos (or use an expired section). Publish the older, non-transcribed videos to the class, then remove them. Publishing will trigger automatic transcription when the ASR Course Media toggle is on; the video does not need to remain in the class for more than a few seconds. All currently published versions of the video will include the transcriptions.
Alternatively, as an Administrator, you can select to Request Transcript for any piece of media on the Captures page.

If your recording is longer than 8 hours, then it will not be eligible for ASR Transcription. This is an Amazon restriction. You can manually transcribe these videos and upload the VTT file to apply it to the media. Alternatively, you can edit longer videos into shorter segments, which will then be transcribed whenever the ASR feature is toggled (either upon saving after editing or when posted to a course).

Captures / videos that have already been auto-transcribed are not sent for automatic transcription again unless the file has been modified. The ASR service detects that the video has an automated transcription and compares the audio file byte-for-byte. As long as it is the same as the file originally submitted (has not been edited) the media is not re-submitted for transcriptions.

How Is the Automatic Transcription Service Paid For?

EchoVideo’s ASR offering is a paid service, and each customer / institution has an allocation of transcription hours included in their EchoVideo contract. Allocations are based on the annual contract period and are reset each year.

ASR usage is based on the number of hours of capture being transcribed. If your contractual allocation does not provide sufficient transcription coverage, you can pre-purchase additional hours by contacting your regional account team.

EchoVideo contract-allocated transcription hours do not roll over from year to year. However, any additional hours you purchase above the contract-granted ones that are not consumed will not expire or reset.

If / When you reach your ASR Allocation limit, your EchoVideo account representative will notify you as soon as possible. At that point, you will be asked if you wish to purchase more hours to continue using the service. If you do not, the ASR service will be turned off for your institution.

Existing automated transcriptions will always remain with the media they were applied to; they are not removed, regardless of whether you continue to use the ASR service. The ASR Allocation simply determines whether you have transcription hours available in your account, and therefore whether or not more media can be sent for automatic transcription. Manual transcription and upload are always available.

Related to