To improve the accuracy of EchoVideo’s automated transcripts, institutions can provide a custom vocabulary containing organization-specific terms, acronyms, product names, proper nouns, and technical terminology.
Custom vocabularies are used with AWS Transcribe to help improve recognition of words and phrases that may not be commonly recognized in standard transcription models.
For additional details, see the attached sample file and / or Creating a custom vocabulary using a table.
File Requirements
Your custom vocabulary file must meet the following requirements:
- Saved as either a:
- A tab-separated text file (.txt), or
- A comma-separated values file (.csv)
- UTF-8 encoded to ensure proper character representation and compatibility across systems and platforms
- Maximum file size of 50 KB
- Properly formatted according to the AWS Transcribe vocabulary requirements
- Includes one term or phrase per line or row, depending on the file format used
Supported Files
EchoVideo recommends using one of the following file types when creating custom dictionary files.
- A tab-separated text file (.txt)
- A comma-separated values file (.csv)
| Header | Required | Description |
|---|---|---|
| Phrase | Yes | The word or phrase to detect |
| SoundsLike | No | Optional phonetic spelling |
| IPA | No | Optional IPA pronunciation |
| DisplayAs | Yes | How the term should appear in transcripts |
Example .txt Format
Phrase[Tab]SoundsLike[Tab]IPA[Tab]DisplayAs
Echo-Video[Tab][Tab][Tab]EchoVideo
Echo Video[Tab][Tab][Tab]EchoVideo
echo video[Tab][Tab][Tab]EchoVideo
eco video[Tab][Tab][Tab]EchoVideoExample .csv Format
Phrase,SoundsLike,IPA,DisplayAs
Echo-Video,,,EchoVideo
Echo Video,,,EchoVideo
echo video,,,EchoVideo
eco video,,,EchoVideoFormat Guidelines
Follow these guidelines when creating your custom dictionary file.
Add One Entry Per Line
Each term or phrase should appear on its own line to ensure clarity and proper organization.
Include Multiple Variations
AWS Transcribe may interpret speech differently depending on pronunciation, accent, or spacing. Including multiple variations can improve recognition accuracy.
Use Phonetic Spellings When Needed
Some words or proper nouns may benefit from phonetic spellings. Examples include:
- MOOC spelled as
mook - EchoVideo spelled as
echo video - Institution names
- Technical terminology
Acronyms
If an acronym should be pronounced as individual letters, separate the letters with spaces. For example, instead of writing UNMC as one word, write it as U N M C to guide proper pronunciation.
Case Sensitivity
Terms are sensitive to capitalization, so it is essential to use the exact case you want to appear in the final transcript. For instance, if the correct form is Echo360, do not write it as echo360, as this will affect how the term is displayed and understood.
Phrase Formatting
The DisplayAs column supports symbols and special characters, such as Node.js or C#. Other columns support only characters available for the selected transcription language.
When creating entries in the Phrase column, they cannot begin with a period (.), an apostrophe ('), or a hyphen (-). Invalid examples include -biology or .lecture.
Entries cannot end with an apostrophe (') or a hyphen (-). Invalid examples include campus- or students'.
Avoid repeated punctuation in entries, including double hyphens (--), double periods (..), or double apostrophes (''). Invalid examples include co--author, wait..what, or professor''s.
Periods should only be used for acronyms where each letter is separated by a period, such as U.S.A. or P.H.D.. Invalid examples include USA. or PH.Degree.
If a phrase includes numbers, spell them out instead of using digits in the Phrase column. For example, Math 101 should be written as Math one oh one, and Biology 2 should be written as Biology two.
Digits (0-9) can be used only in the DisplayAs column.
Phrase Length
Try to keep each term concise by limiting phrases to no more than 12 words. This ensures that entries remain clear and easy to understand without becoming overly complex or lengthy. Long phrases may reduce transcription accuracy.
Submit Your Custom Dictionary
Once your dictionary file is complete, submit the following to Echo360 Support:
- Your completed dictionary file
- The language the dictionary should apply to
- Your institution name or EchoVideo environment details