EchoVideo has partnered with Amazon Web Services to provide transcription services for audio and video media in EchoVideo. ASR stands for Automatic Speech Recognition and uses computers to translate speech into text and synchronize the text with the video.

If your institution prefers to do manual transcriptions and not use the ASR service, you can still use the EchoVideo Transcript Editor to upload transcripts created outside of EchoVideo, then perform further edits and apply transcripts as closed captions if appropriate.

There are several institution-level ASR features that you can set, all available from the Institution Settings > Features page:

ASR Course Media - sends all video / audio / interactive media for transcriptions at the time they are posted into a class in a section (if the media does not already have a transcription).
ASR All New Media - sends all video / audio media for transcription at the time it is added to EchoVideo (and is successfully processed). This includes user file uploads, captures generated by EchoVideo capture appliances, and all universal capture uploads. It also applies to all users and includes media added by students.
ASR Language Settings - Allows you to select the primary language used in your media recordings, and instructs the transcription service to use that language for transcribing the speech.
Automatic Push to Closed Captions - Allows you to set a confidence score for your automated transcripts, and tell EchoVideo to automatically apply those ASR transcripts to the closed captioning track if the transcript meets or exceeds the level you set.

If you choose not to enable the Course Media or All Media ASR options for your institution, you can still request transcripts for individual pieces of media. This allows you to take advantage of the automated transcription service without having to transcribe media on a large scale.

These toggles and any references to ASR in this document or in the EchoVideo documentation refer to automatic, third-party-generated transcriptions of media.

Once the media has an ASR transcription, it is not resent for transcription as long as the media has not been edited (the audio file has not changed in any way). This means that if both toggles are turned on and a video was transcribed when it was uploaded, it will not be re-transcribed when it is published. It retains the original transcription. This also applies to individual media transcription requests; if the media has an automated transcript and it has not been edited, the media will not be sent for re-transcription.

Turning off any of these toggles has no effect on existing transcriptions. Transcriptions remain with the media throughout their lifecycle, regardless of how they were generated.

Manual transcription and upload for video / audio media are always available, as is the Transcript Editor. You may also want to refer to the student-access features for transcripts: Allow Students to Edit Transcripts for Class-Published Media and / or Allow Students to Edit Transcripts for Library Media. Some institutions assign media transcription and editing as a work-study task, allowing students to take on this work.

The following sections provide some of the more technical details about the transcription service.

What Is the Difference Between Transcriptions and Closed Captions?

For end users (e.g., instructors or students), see Transcriptions vs. Closed Captions to see the difference in how each is presented in the media players.

Transcription differs from captioning in several ways. Transcription is the process of converting speech to text and does not include sound effects and non-speech elements that are often included in closed captions. Furthermore, an automatic speech recognition transcription service is unlikely to meet the accuracy levels required for closed captions for hearing-impaired individuals. You will find this particularly true in the case of low volume captures or those where the audio is interfered with by background noise, or possibly even in the case of non-native speakers whose accent is thick enough to cause word-recognition problems for the transcription service.

All that being said, transcriptions can be applied much faster than closed captions and typically cost less to generate. Transcriptions may provide a suitable solution for delivering both visual and audio content for lectures. In particular, transcriptions may work as an interim visual-text measure during the interval while closed captions are being generated and applied (sometimes a day or more, depending on required accuracy levels).

Alternatively, because both closed caption files and transcription files use the WEBVTT standard, it is possible to generate automatic transcriptions, edit them for accuracy, and then apply them to the media as closed captions. Alternately, you can tell EchoVideo to automatically push transcripts to the closed captioning track if the transcription service applies a high enough confidence score to the transcripts.

When Do Transcriptions Get Applied?

When a piece of video / audio media requires transcriptions, the request and application depend on the configuration of the feature toggles noted above. It can happen either when the media is added to the system (ASR All New Media) or as each media is posted to a class in a course (ASR Course Media).

Understand that for timing, even when requested for individual media (by an admin), transcripts take at least 30 minutes to be returned. More if there are a lot of requests happening at once, and logically, longer recordings take longer to transcribe.

ASR Course Media

Turning on the ASR Course Media toggle instructs EchoVideo to send media for transcription when it is posted to a class within a course. In the case of capture schedules that auto-publish to a section, or an ad hoc recording that is published directly to the course, the creation of the media and the publishing occur together. For media uploaded to a user's library, or capture schedules or ad hoc recordings that do not auto-publish to a section, those media are not sent for transcripts until they are posted to an EchoVideo course.

For example, if an instructor generates an ad hoc capture but selects Library as the Publish-to location, that video will not be transcribed until it is published to a class in a course. If an instructor uploads a video directly to the Class List in a section, that video is sent for automatic transcription and will display the results when they are finished.

This feature can also be turned on for the whole institution, or it can be set so that Organizations, Departments, and individual course sections can have it on or off as needed

Keep also in mind that the availability of a capture is not the same as publishing. You can publish a video while keeping it unavailable to students. Since the video is still published, it will trigger an automated transcription at that time (if ASR Course Media is turned on).

If a video is edited while it is currently published, it will be sent for transcription again after the edits are complete and the user clicks Save. This means that the video may be transcribed twice (once on initial posting to the class, once after editing it while published).

The connection to make between the above two points is that if you / your instructors' standard procedure is to generate a capture, auto-publish it to a section but not make it available to students for some period while the instructor edits the video, that video will be transcribed twice; once when it is initially published, then again after the edits are complete and saved.

ASR All Media

Turning on the ASR All New Media toggle instructs EchoVideo to send media for transcription as soon as it is added and completes media processing. This applies to all video / audio media added to EchoVideo by any user, including students, in any way. The MP3 file created during media processing is sent for transcriptions as soon as it is available.

This also means that as an Admin, you can select to reprocess captures, which re-generates the MP3 file. If this feature toggle is turned on, those re-generated MP3 files are sent out for transcriptions. This is one way to obtain transcriptions for older items or previously unpublished media that may require them.

Transcription of all media also applies to edited media. For example, if an instructor edits a video that already has transcriptions, then clicks Save, the edited video is processed, and the newly generated MP3 file is sent for transcription. This is because the audio file changed with the edits and needs to be re-transcribed. This applies even if the edits include removing a silent section of the video; the audio file itself has changed, so EchoVideo sends the new file for transcription. In the Transcript Editor, the video also has at least two versions of the transcript: the original and Version 1, which corresponds to the edited version of the video.

If a Manual Transcription is applied to a capture / video before it is published, and then the video is published, the video will not be sent for automatic transcription. In this case, the uploaded transcription is considered the original, and the automated one is an update. Reverting to the original would restore the transcription that was originally uploaded. Although we do not expect this to be a common use case, we wanted to note it here for your reference.

How Long Does It Take for a Transcription to Appear

It takes at least 30 minutes for a video to receive automatic transcriptions, longer for videos that are more than an hour in length and / or if the transcription service is processing a large number of requests at the time.

Currently, transcriptions are not visible in the media details playback. If you need to know whether or not an item has transcripts, refer to the Transcript entry in the Details tab of the Media Details page. The Transcript entry will read Add if there is no transcript, and will read Update if there is one.

Alternately, as an Admin, select Edit Transcript from the chevron menu for any completed capture entry; the Transcript editor opens and displays a message if the item does not have a transcript file.

In What Instances Are Videos Not Automatically Transcribed?

Transcriptions are not back-applied to existing captures. Captures and videos already in the system at the time the ASR Toggle is turned on must be either published / re-published or reprocessed (depending on the ASR toggle's settings), or sent individually to request transcripts.

If you remove and then re-publish a capture to a class to obtain transcriptions, you will remove student video view data from the section analytics for that class / video. A better option is to have instructors create a holding class solely for temporarily publishing videos (or use an expired section). Publish the older, non-transcribed videos to the class, then remove them. The act of publishing will trigger automatic transcription if the ASR Course Media toggle is on; the video does not need to be left in the class for more than a few seconds. All currently published versions of the video will include the transcriptions.
Alternately, as an Administrator, you can select to Request Transcript for any piece of media on the Captures page.

If your recording is longer than 4 hours, then it will not be eligible for ASR Transcription. This is an Amazon restriction. You can manually transcribe these videos and upload the VTT file to apply it to the media. Alternatively, you can edit longer videos into shorter segments, which will then be transcribed whenever the ASR feature is toggled (either upon saving after editing or when posted to a course).

Captures/videos that have already been auto-transcribed are not sent for automatic transcription again, as long as the file has not been modified. The ASR service detects that the video has an automated transcription and compares the audio file byte-for-byte. As long as it is the same as the file originally submitted (has not been edited) the media is not re-submitted for transcriptions.

How Is the Automatic Transcription Service Paid For?

EchoVideo’s ASR offering is a paid service, and each customer / institution has an allocation of transcription hours included as part of your EchoVideo contract. Allocations are based on the annual contract period and are reset each year.

ASR usage is based on the number of hours of capture being transcribed. If your contractual allocation does not provide sufficient transcription coverage, you can pre-purchase additional hours by contacting your regional account team.

EchoVideo contract-allocated transcription hours do not roll over from year to year. However, any additional hours you purchase above the contract-granted ones that are not consumed will not expire or reset.

If / When you reach your ASR Allocation limit, your EchoVideo account representative will notify you as soon as possible. At that point, you will be asked if you wish to purchase more hours to continue using the service. If you do not, the ASR service will be turned off for your institution.

Existing automated transcriptions will always remain with the media they have been applied to; they are not removed regardless of whether you continue to use the ASR service or not. The ASR Allocation simply determines whether you have transcription hours available in your account, and therefore whether or not more media can be sent for automatic transcription. Manual transcription and upload is always available.

Related to