Closed Captioning Glossary
Closed Captioning Glossary
Section 508 refers to accessibility law, which requires video broadcasts and many webcasts to be made accessible to the deaf and hard of hearing. Please see the official Section 508 web site for more details. Not directly related to CEA-608 and CEA-708, which are technical standards.
CEA-608 refers to the technical standard for captioning standard-definition NTSC video. It is also commonly referred to as "Line 21 closed captioning". Click here for an in-depth look at the difference between 608 and 708.
CEA-708 refers to the technical standards for captioning high-definition video. It is also commonly referred to as the VANC data or "Line 9 closed captioning". Click here for an in-depth look at the difference between 608 and 708.
Active Format Description (AFD)is a standard for telling a receiving device how to best frame video. For example, without AFD, a 16:9 signal which contains a pillarboxed 4:3 video would then be letterboxed for display on a 4:3 TV, making a tiny image surrounded by black bars on all sides. With AFD, the TV knows that it can crop the video so that the actual picture fills the whole screen. For more information click here (Wikipedia).
ATSC (Advanced Television Systems Committee) is the digital television (DTV) standard used by broadcasters for HDTV and the digital broadcast of SD in the United States and Canada. ATSC supports closed captioning (608 and 708) in the metadata of the video signal.
Burn-in refers to a graphic, text, or image that is superimposed on video, and thus becomes part of the video itself. Closed captions are not burned in, since they can be turned on and off, unlike open captions and many subtitles which cannot.
A .cap file can refer to many different types of files, so you need to be careful when using them. Several formats of caption files have the .cap extension, including the popular Cheetah .cap format. A .cap file can also be a project file for CPC Caption maker (PC) which must be opened in CaptionMaker before exporting to another caption format.
CaptionMaker is closed captioning software developed by CPC for the PC platform. CaptionMaker reads and writes all major captioning formats and supports many traditional workflows involving hardware encoders. In addition to broadcast SD video, CaptionMaker encodes captions for web formats like Quicktime, Flash, YouTube, and Windows Media, and also tapeless workflows like MPEG-2 Program Streams.
Closed Captioning is text that appears over video that can be turned on an off using a decoder which is built into most consumer television sets and cable boxes. Closed Captions differ from subtitles in that they contain information about all audio, not just dialogue.
Codec stands for "coder-decoder", it is a method of compressing video in order to strike a balance between file size and quality. Different codecs have different data rates, aspect ratios, and methods of closed captioning in order to achieve this balance. Some examples of codecs are DV, MPEG-2, WMV, H.264, Uncompressed, and ProRes. To watch a video, your computer needs the specific codec that video uses, otherwise it will not play. Not all codecs are available for all operating systems, and they may not be free to use.
A container format is a way to encapsulate video so that it can be viewed in a video player, edited in an non-linear editor, or processed in some other way. Examples of container formats are Quicktime, AVI and MXF.
A decoder is a device that makes enables closed captions to be turned on if they are present in the video signal, essentially turning closed captions into open captions. Typically a decoder is inside your TV or cable box and you can turn captions on using your remote or a setting in the menus. There are also hardware and software decoders available that allow you to preview captions on your computer or a master tape to ensure that they are present.
"Drift" is a term used to describe a specific type of behavior of closed captions. It can either mean they are appearing progressively later than they should, or progressively earlier. Most often, this occurs slowly over the duration of a program, resulting in a discrepancy of over three seconds by the end of an hour. Drift is most often caused by a drop-frame / non-drop discrepancy.
Drop-frame timecode refers to a method of counting timecode in 29.97 fps video. It does not refer to actual frames of video being dropped that would affect video quality. Since 29.97 fps is not exactly 30 fps, when counting in drop-frame certain numbers in the timecode counter are skipped in order to ensure that the timecode will reflect the real-time length of the program.
The counterpart of drop-frame is non-drop which does not skip numbers when counting timecode. To prevent drift, it is important to timestamp in the correct mode (or convert your captions' timecode using CPC software) when closed captioning a 29.97 fps program.
DTV stands for "digital television" and is a general term encompassing digital television around the world to distinguish is from analog television (such as NTSC). In the US the standard for DTV is ATSC.
A data stream that contains either video or audio data, but not both. Usually associated with MPEG video files and given the extension .m2v for video, or .m2a or .ac3 for audio. Elementary MPEG-2 video streams can contain closed caption data. A Program or Transport stream can be demultiplexed, or separated, into its component Elementary streams.
The term "encoder" typically refers to a hardware encoder, but can refer to software encoders as well. A hardware encoder is usually a rack-mounted device that accepts a video signal, marries it to closed captions, and then outputs a new closed captioned video signal, usually resulting in generation loss.
A software encoder, such as MacCaption, can add captions to video without a hardware encoder. You can simply encode captions to video files already present on your computer, or to file formats that will add captions as you output from your NLE with no generation loss.
High Definition is a television standard with either 720 or 1080 lines in the video signal. Closed captioning for HD is sometimes called Line 9 or VANC, and is codified under the 708 standard.
Ruby subtitle formatting is a popular style of formatting that allows viewers to see an annotated description of a word on screen. Typically, the annotation is located above the text and is shown in smaller text size. This formatting is compatible with subtitle burn-in exports such as "Combined Subtitles" and with TTML OTT formated files such as iTunes .iTT.
Line 9 refers to the location of the VANC closed captioning data in an HD video signal. In the full raster, it appears it is the 9th line from the top of the frame.
Line 21 refers to the location of the VBI closed captioning data in an NTSC 720x486 signal. It actually appears on lines 21 and 22 since line 22 is the second field of the closed captioning data.
Live captioning is captioning process used for live webcasts or broadcasts to add captions to video on the fly. It requires several important tools. The first is a source of transcription such as a stenographer or speech recognition software.
Please note, getting speech recognition software to usable levels of accuracy still requires an individual to operate it. That second item required for live captioning is a hardware encoder which will accept the video signal and the closed caption data and combine them for output. Last, you may need captioning software to tie these two things together (especially if you're using speech recognition software).
A live captioning text stream is an output of a live caption text stream to a URL specified in the Live Text Stream Options dialog. This a way to stream live captions to an HTML web page separate from the live video stream. Live Text Stream can be used in a closed local area network web page or over the world wide internet. Viewers can vide the live text stream on any web browser both on a personal computer or mobile phone.
MacCaption is closed captioning software developed by CPC for the Mac platform. MacCaption reads and writes all major captioning formats and supports the latest closed captioning workflows for Final Cut Pro. In addition to broadcast HD and SD video, MacCaption encodes captions for web formats like Quicktime, Flash, YouTube, and Windows Media, and also tapeless workflows like MPEG-2 Transport Streams, DVCPRO HD and XDCAM.
A .MCC is a MaCCaption closed captioning file, and the only file format that supports both 608 and 708 (SD and HD) closed captioning, unlike .SCC, which only can encode 608 (SD) closed captions. This comprehensive format is being used by several companies for integration in to their closed captioning workflows.
MPEG-2 can refer to not only a video codec, but also a container format. MPEG-2 can come in three different file types, Elementary Streams, Program Streams and Transport Streams. MPEG-2 files are becoming a more common form of video delivery because it allows a broadcaster to put it directly on their server instead of ingesting from tape. MacCaption can add captions to all three forms of MPEG-2 files.
Non-Drop timecode refers to a method of counting timecode in 29.97 fps video. It does not refer to actual frames of video being dropped that would affect video quality. Since 29.97 fps is not exactly 30 fps, when counting in non-drop, the timecode will get progressively further and further behind "real time."
For instance, after 2000 frames a drop-frame counter will display 00:01:06:22, while a non-drop counter will display 00:01:06:20, but the content and real-time length of the video will be the same. The drop-frame counter is slightly ahead because it's goes straight from 00:00:59:29 to 00:01:00:02. To prevent drift, it is important to timestamp in the correct timecode mode (or convert your captions' timecode using CPC software) when closed captioning a 29.97 fps program.
A non-linear editor (NLE) is a piece of software that allows you to edit video by moving pieces of it around in a timeline with multiple layers of video. This is in contrast to linear editing, which forces you to add one piece of video after another to tape in a linear fashion. Many NLEs support closed captioning for HD, SD, or both. Examples of non-linear editors are AVID, Final Cut Pro, Premiere Pro, and Sony Vegas
NTSC (National Television Systems Committee) is the analog television standard for North America, Japan, and some other parts of the world. NTSC supports closed captioning (608 only) on Line 21 of the video signal.
OP-47 (Operational Practice number 47) is a solution for inserting a Teletext ancillary data channel into HD media. OP-47 was originally developed by Australian broadcasters and is now widely used. The data can be stored in the SMPTE 436M track of an MXF wrapper. OP-47 Teletext stores data in selectable pages within magazines (1-8). These can be selected by viewers using menu controls on their TV monitor. Typically, the subtitle data can be stored in magazine 8 page 01, or Teletext 801. Other samples may have Teletext 888, which is magazine 8 page 88. This is according to the SMPTE-RDD8 specification.
Teletext supports Latin, Hebrew, Arabic, Greek, and Cyrillic alphabet. However, it does not support Asian unicode characters such as Chinese and Japanese. CaptionMaker can only insert Latin alphabet Characters into OP-47.
Open captions are captions that do not need to be turned on, they are always visible. This is opposed to closed captions which must be turned on with a decoder. Open captions are actually part of the image itself, this is also known as burned-in captions.
Paint-on captions appear on the screen from left to right, one character at a time. This mode of displaying captions is uncommon except as the first caption of many commercial spots to reduce lag.
Pop-on captions appear on the screen one at a time, usually two or three lines at a once. This mode of displaying captions is typically used for pre-recorded television.
Program Stream is a data stream that multiplexes, or combines, a single video and a single audio stream together. Usually given the extension .mpg, and used for files to be played on a PC, some DVD authoring systems, and some tapeless distribution.
Roll-up captions appear from the bottom of the screen one line at a time, usually with only three lines visible at a time. This mode of displaying captions is typically used by live television like news broadcasts.
SCC stands for "Scenarist Closed Caption", a file type developed by Sonic. SCC files have become a popular standard for many different applications of closed captions. Some programs that use .scc files are Sonic Scenarist, DVD Studio Pro, Final Cut Pro, and Compressor.
When using speech recognition software, a shadow speaker is a person who repeats everything said in a programs in to a microphone so that the speech recognition software only has to interpret that shadow speaker's voice and not the multiple voices in the program. After training the software (about 15 minutes), it can achieve accuracy rates up to 90-95% in a clean audio environment.
Speech recognition software takes spoken word and translates it into text. State of the art speech recognition technology can only achieve 60-80% accuracy without the use of a shadow speaker. Software that uses a shadow speaker can achieve up to 90-95% accuracy, but is limited to recognizing one person's voice at a time and needs to be used in a clean audio environment.
Standard Definition is a television standard with (typically) 480 lines in the video signal (486 when NTSC). Closed captioning for SD sometimes called Line 21 and is codified under the 608 standard.
A stenographer is a person who can transcribe video from audio on the fly (like a court reporter). Stenographers can dial in to a hardware encoder remotely over a phone line so that closed captions can be added to a video signal for live broadcast. See also: Live Captioning.
Subtitling is text that appears on screen that normally only gives information about dialogue that is spoken. With the exception of DVD and Blu-ray, subtitles cannot be turned off, but are burned into the image.
Transport Stream is a data stream that multiplexes, or combines, multiple video and audio streams together with other metadata. Usually given the extension .ts, .m2t, or .m2ts, and used for DTV broadcast, VOD, tapeless delivery, and other systems where multiple channels are mixed together.
VANC stands for "Vertical ANCillary data space" and refers to the data stored on Line 9 in HD video (outside the display area) that holds the 708 closed captioning data while it is going over an HD-SDI signal or on an HD tape format. VANC data appears on only the part of Line 9 towards the left, but VANC data can also carry different information, like V-chip data.
VBI stands for "Vertical Blanking Interval" and is the time between the last line or field drawn in a video frame and the first line or field of the next frame. This is usually measured with lines, in NTSC there are 40 lines for VBI. Closed Captioning data for NTSC video is stored on Line 21 of the VBI.
See Speech Recognition.