Closed Captioning FAQs
Closed Captioning FAQs
Since it is updated frequently and too long for this FAQ, please refer to this glossary page.
Captioning is text that appears on a video, which contains dialogue and audio cues such as music or sound effects that occur off-screen. The purpose of captioning is to make video content accessible to those who are deaf or hard of hearing, and for other situations in which the audio cannot be heard due to noise or a need for silence.
Captions can be either open (always visible, aka "burned in") or closed, but closed is more common because it lets each viewer decide whether they want the captions to be turned on or off. Closed captions are transmitted as a special coded signal sent along with the video picture, and require a decoder in your television or cable box to see them (most TVs and cable boxes do). This special signal is what broadcasters check for to ensure that your video is in accordance with the law. Captions typically have the appearance of white, mono-space text on an opaque black background.
Subtitling is text that appears on a video and typically contains only a transcription (or translation) of the dialogue. Subtitle tracks which also contain non-verbal audio cues are called "SDH" (Subtitles for the Deaf and Hard of hearing) tracks.
Often, subtitles are burned into the video so they can't be turned off, although DVD and Blu-ray (and a few other formats) can contain multiple tracks of subtitles which can be selected and turned on or off by the player. Subtitles vary in appearance and have more fonts and colors available than captions. Subtitles are required by law in many European and Asian broadcast markets.
The FCC requires the majority of programming seen on broadcast TV in the United States to be closed captioned. The CRTC requires the same for Canadian broadcasts. Many other countries have their own requirements as well. Section 508 regulations contain stringent captioning requirements (including captioning for webcasts) for the Federal government and organizations that receive funding from the Federal government, which includes most academic institutions. The ADA requires that videos be accessible to the deaf and hard of hearing in public accommodations and other public venues, such as hotels and stadiums. Captioning is an ideal way to make video accessible.
Even if your program is not being broadcast or is otherwise exempt from the closed captioning requirements, keep in mind that over 10% of the population is deaf or hard of hearing. If you do not caption, you will not reach this audience. Closed captioning is also used by a large number of non-native language speakers to help them better understand the programs they watch. If you do not caption your videos, you are missing out on a huge portion of your potential market.
The first (and usually most time consuming) step is to get a transcript of your video, which contains all of the dialogue as text, as well as the non-verbal audio cues.
For post production captioning (not live), the next step is called time stamping or synchronization, in which each caption line or cell is synchronized to the appropriate in and out time to match up with the video. Modern software makes this step very easy, and it can often be finished faster than real time. There is no need to struggle with manually entering time codes by hand, or dragging in/out points around on a timeline.
Finally, the captions need to be encoded, or converted from text and formatting information into the special code that is used for broadcast. The encoding used to be exclusive to legacy tape-to-tape hardware encoding systems, but recently, software encoders that works with your NLE software or video server have become available. This encoded data must then be properly merged into the video signal or data so that it can be transmitted along with the video.
If you already have a shooting script, lecture notes, etc., these can be used to partially or completely eliminate the need to transcribe the video. If you don’t have a script at all, the fastest way to get a transcript (and the most popular option for live captioning) is to use a professional stenographer (like a court reporter) who has the specialized skill to operate a steno machine, allowing them to type much faster than on a regular keyboard. Because this is a special skill that takes years of training, stenographers tend to charge a lot of money. Another option used for live captioning is a shadow speaker, who listens to what is being said by all voices in the video and repeats them in his/her own voice, using a speech recognition system (see below) to turn the speech into text.
As of August 2011, there are no commercially available speech recognition systems which can simply take a finished video and transcribe all of the speech into text with enough accuracy for intelligible closed captioning. The automatic speech-to-text systems that do exist are not yet reliable or accurate enough. Universal speech-to-text is an extremely difficult problem despite many years of research.
Software is available which can recognize a single speaker’s speech with good accuracy, as long as that speaker first trains their voice into the system, dictates clearly, and there is no background noise or music. These systems are sometimes used for captioning live broadcasts in smaller markets, where hiring a stenographer would be too expensive. When the video contains multiple voices, a shadow speaker is used. This is a person who listens to everything that is said in the video, and repeats it in his/her own voice, like a language interpreter but without changing the language. This allows the speech recognition software to be tightly trained onto the shadow speaker’s voice patterns, enabling reasonably good accuracy (up to 90-95% with experience).
Closed caption data for broadcast TV consists of a complex, multi-threaded stream of control codes, commands, and text and timing information. The proper encoding of this stream, and insertion of the data into the proper location in the video signal, is part of what makes closed captioning data difficult to work with.
The main difference is that subtitles usually only transcribe the spoken dialog, and are mainly aimed at people who are not hearing impaired, but lack fluency in the spoken language. Closed captions are aimed at the deaf and hearing impaired, who need additional non-verbal audio cues (such as "[GUN SHOT]" or "[SPOOKY MUSIC]") to be transcribed in the text. Closed captions are also useful for situations in which video is being shown but the sound is muted or difficult to hear, such as for a noisy bar, convention floor, video signage & billboards, etc.
Subtitles which also contain these non-verbal cues are usually referred to as "SDH" (Subtitles for the Deaf and Hard of hearing) tracks, but for various reasons, closed captions are usually the preferred method of making video accessible for the deaf and hearing impaired. SDH subtitles tend to be used on video formats that do not support closed captioning, such as Blu-ray discs.
Another difference between closed captions and subtitles is that closed captions are transmitted as an encoded stream of commands, control codes, and text. Because it is text, it can be turned on and off at the viewer's discretion. Many TVs allow you to choose how the text is rendered on the screen (extra large, extra small, transparent, etc.) It also means that the text can be decoded and edited or converted to other formats. On the other hand, subtitles are carried as bitmap images so the font and size is pre-determined and cannot be changed by the display device, and because they are images not text, they cannot easily be decoded back into text form or converted to other formats.
For those delivering their video for broadcast, perhaps the most important difference between captions and subtitles is simply that closed captioning is required by the FCC/CRTC, and your video cannot be broadcast or distributed unless the closed captioning data is present. Subtitles (even SDH subtitles) do not satisfy these requirements.
Closed captioning has often been considered only as an afterthought, not only for editors and production facilities, but also for hardware and software manufacturers. Different video systems had to invent different workarounds in order to support closed captioning, so there are a huge variety of different workflows needed to support all the various formats and systems that are out there, and many “gotchas” that can disrupt an otherwise smooth workflow. Also, most NLE programs and video conversion and compression programs were not designed to support closed captioning at all, so different “tricks” are needed in order to make it work. Often, any processing or conversion done to a video will strip out the closed captions. A huge amount of R&D has been necessary to find, support, and document all of the various workaround and tricks that are specific to each format or system.
Fortunately, advances in end-user software workflows are slowly but steadily helping to reduce the complexity of closed captioning, such that it is now practical for most editors with modest NLE systems to do their own captioning, even in HD.
Closed captioning had traditionally been expensive because very expensive decks and legacy hardware encoders were needed, and because it was a linear process, this gear would be tied up for a considerable length of time when doing the captioning. Also, because closed captioning (especially for HD) required a lot of specialized video engineering knowledge, running a captioning system was previously seen as a high-end endeavor limited to the best equipped production facilities.
However, recent advances in software encoding for popular NLE systems have drastically reduced the cost of entry and resources needed to do closed captioning in-house.
The bulk of the closed captioning work occurs in the transcription step, in which the dialogue and audio cues of the program have to be entered as computer text. This step is very labor intensive. Having a script (shooting script, notes, etc.) can drastically cut down the time needed for this step. Doing it in-house may be time consuming but can also save a lot of money.
In the past, it was not feasible for most facilities to bring closed captioning in-house, due to the very high cost of the necessary hardware and software, as well as specialized training needed to run the system. However, recent advances in affordable and easy to use closed captioning software encoders (which eliminate the need for expensive hardware) have made it possible for anyone to do their own closed captioning, right from their NLE system.
Captioning in-house can save a lot of money if you have a large volume video that needs to be captioned. It also saves time and money because you do not need to print to tape and ship it to a 3rd party service company, then wait for them to ship the tape back. Since you’re doing it all in-house, you have full control over quality and can inspect the results and make necessary changes immediately. There is no risk of losing your master or having the project be delayed due to shipping problems or damaged tapes.
If you need extremely fast turnaround or the ability to make last minute changes (such as editing and delivering a show same day), having captioning capabilities in-house can save the very high costs of hiring a real-time captioner. Another common scenario is when you need to deliver multiple versions of a program for different markets. Doing this in house can save a lot of money in tape stock and captioning fees.
Even if you use a service for the bulk of your captioning needs, having the ability to edit and convert closed captions in-house can be extremely useful. For example, if you need to make a last minute edit or correction, you will be able to fix up the captions without having to print a new un-captioned tape and wait several days turnaround (and pay fees) for the service company to make a new captioned master. If you need multiple versions of a program for different markets, you can have the service company do the bulk of the captioning work and make the small changes yourself, without incurring additional fees. It will also allow you to re-use and convert closed captions you have done on past projects. For example, if you want to take a captioned tape from your library and re-master it for DVD, Blu-ray, or the web, you can convert the closed captions on the master into captions or subtitles for any other delivery method.
To figure out the right workflow for successfully captioning your video, the most important factors are what format you need to deliver in, and what kind of system you are using to create or convert to that format. If it is a tape format, then the specific deck and the hardware interface between the source and the deck come into play. If it is a file format, then not only the specifics of the file format are important, but also on the receiving end, it is important to know what kind of system will be used to read or play back the file. Since caption data is fragile, it usually does not survive format conversions or transcoding unless special care is taken, and many video systems and programs were not designed to read or preserve the closed captions at all.
ATS works very well for videos such as (but not limited to): speeches, lectures, presentations, meetings, training videos, etc. which have clean audio. It does not work as well if the audio has already been mixed with music or sound effects, so it is isn't recommended for captioning dramatic content, music videos, videos with lots of background noise or low fidelity audio, such as telephone or VoIP recordings.
DVD supports both NTSC closed captioning (CC1-CC4) and subtitles (32 tracks). In the case of closed captioning, the captions are usually decoded and displayed by the TV, so they are controlled using the TV’s menu or remote. Subtitles are controlled by the DVD player and can be turned on/off with the player’s remote or by commands in the DVD menus. DVD does not support Teletext, which is the PAL standard for closed captioning.
Blu-ray currently supports subtitles only. There is no specification for adding closed captions to Blu-ray discs. For this reason, most commercial Blu-ray discs contain both regular subtitle tracks (dialogue only), and also SDH (Subtitles for the deaf and hard of hearing) tracks, which contain both dialogue and non-verbal audio cues.
Closed captions or subtitles can be added to a DVD project in the authoring stage, using most professional DVD authoring programs like DVD Studio Pro, Adobe Encore, Scenarist, etc.. Most consumer DVD programs like iDVD or Toast do not support closed captioning or subtitles.
Professional DVD authoring programs like those listed above can import closed captions using a SCC (Scenarist Closed Captions) file, which can be created by a number of popular programs and caption service companies.
Some DVD authoring programs have subtitle creation capability built-in, although many users find it faster and easier to use software specifically designed for rapid caption/subtitle creation. In this case, the subtitling software will export either a text & time script for the DVD authoring program (such as .STL for DVD Studio Pro), or a time script with links to image files which comprise each subtitle (.SST + .TIFs for Scenarist, .XML + .PNGs for Blu-ray, etc.).
Authoring a DVD is like baking a cake. Once this disc is made, you cannot easily “unbake” it to change the recipe. The best thing to do is to go back to the source project which was used to author the DVD, add the caption/subtitle tracks, and burn a new DVD disc or image. Although it is technically possible to recover some (but not all) of the assets from a burned DVD and then re-author it, this is not recommended.
DVD stores closed captions differently than analog or DV video, using data packets instead of the "dots and dashes" in line 21. When your video file with captions is compressed to MPEG-2 for DVD, the caption data will not be preserved.
In most professional DVD authoring programs, the only way to add closed captions to a DVD is to use a SCC file. Fortunately, closed captioning software is available that can easily extract the line 21 caption data from your source video and convert it into a SCC file for DVD captions, or even a subtitles file for DVD subtitles.
Another possible alternative is to use a set-top box (STB) DVD recorder to record your video via analog connections. Most STB DVD recorders will internally convert the analog closed captioning signal into DVD closed captions when recording a DVD.
Many closed captioning programs, such as CPC's CaptionMaker and MacCaption series, can output SCC files. The captions can be created inside the closed captioning software, or they can be retrieved and decoded from a video source that has already been captioned.
Due to the nature of SCC files containing encoded caption data, it is not feasible to create or edit them by hand. The caption text is not in a human readable format, and the time codes do not reflect the actual time codes of the captions due to the buffering of the encoded data.
The first and most important step is to make sure that the closed caption decoder in your TV is turned on and set to the right caption channel (usually CC1).
When using a set-top DVD/Blu-ray player, the captions will not work if the player is connected by HDMI or high definition component (1080i/720p/480p). This is because HDMI does not support closed captions at all, and component only supports captions at standard definition (480i). Some DVD players now include closed caption decoders, and internally render the captions into open captions (burned into the image) before sending the video out via HDMI or component. If you have such a player, the DVD player menu or remote will have a button to turn the closed captions on or off, and the TV’s caption decoder is not used. Not all DVD players with HDMI/component have this capability, so you might have to change your DVD player or switch to the composite or S-video outputs instead.
In rare cases, some TVs do not support closed captioning on all of the inputs, so you can try a different input. Some display monitors or projectors which lack TV tuners do not have caption decoders.
Another common issue is that many software DVD players (for computers) do not support closed captioning, or do not support it 100% correctly. For example, most software DVD players cannot correctly decode roll-up or paint-on closed captions. Some software DVD players can play captions which appear at the very bottom or the top of the screen, but not captions that are in the center area. There continue to be many lingering issues with closed captions on many software DVD players, so it is recommended to check your disc on a real set-top DVD player.
The most common issue is a time code mismatch between the DVD project and the SCC file which contains the closed captions. Many NLE systems use a timeline that starts at 1 hour (01:00:00:00) instead of at zero, and this timing is sometimes preserved when you move into authoring the DVD.
There are actually three timecodes in a captioned DVD: The project’s track timecode, the MPEG video’s internal timecode, and the SCC captions timecodes. A mismatch between any of these 3 timecode systems can cause the captions to not work properly or at all. If your SCC file captions start at zero but your DVD track or MPEG file starts at 1 hour, then all of the captions will be 1 hour early. Conversely, if your SCC file captions start at 1 hour but the DVD track or MPEG-2 file start at zero, then all of your captions will be 1 hour late (and if your DVD is less than one hour, they’ll never appear at all).
For caption service companies, a good recommendation is to always send your client two SCC files: one that starts at 1 hour (01:00:00:00), and another that starts at zero hours (00:00:00:00). Usually, one of the two will work for the project, and this will help avoid needing additional troubleshooting steps.
A common reason for this is that the captions/subtitles were made in Drop-frame (DF) time code, but the DVD or the MPEG-2 file used in the DVD were authored in Non-drop frame (NDF) time code, or vice versa. This will cause the captions or subtitles to be in sync at the beginning, but will slowly drift out of sync at the rate of about 1 second per 20 minutes of video (about 3 seconds per 1 hour). The captions/subtitles track can be fixed by using the “Convert Time Code” feature in many popular captioning/subtitling software programs, and converting DF to NDF time code or vice versa.
If the drift is more severe, it could be another time code issue (NTSC vs. PAL, Film 24.0fps vs. 23.976fps, etc.), or the captions/subtitles may have been timed incorrectly.
NTSC video supports CEA-608 closed captioning (sometimes referred to as line 21 captioning), with up to 4 channels of caption data (CC1-CC4), although typically only two are used at a time (CC1 & CC2, or CC1 & CC3). NTSC video can also have burned-in subtitles or open captions, but there is no way to turn those off once they've been burned into the video.
DTV requires both CEA-708 and CEA-608 closed captioning data, stored as metadata packets within the DTV stream, whether the video is standard definition or high definition. For typical standard definition sources ingested from SD tape, the station will generate the packets by up-converting from the line 21 data on the tape, in the video encoding stage just before DTV transmission.
CEA-608 supports only a single sized, mono-spaced font with characters appearing over an opaque black background. The captions can be displayed one block at a time (pop-on), one character at a time (paint-on), or one line at a time with previous lines rolling up as new lines are drawn in (roll-up). A caption block can have up to 4 lines and up to 32 characters per line, although for accessibility reasons, it is recommended not to exceed 2 lines and 26 characters per line. The captions can be placed at various points on the screen, but there are some restrictions on placement depending on the mode.
Characters can be drawn in a few different colors, although typically only white is used for the actual caption text. Other fonts, styles or sizes cannot be specified. The characters can be italicized or flash on/off.
Since the text and formatting codes are decoded into an image by the TV, the TV has some control over the appearance of the captions. Some newer TVs allow the user to override the default settings and select a different font, size, color, etc., but this depends on the TV and cannot be controlled by the captioner.
CEA-608 supports languages that use the roman alphabet plus certain italicized characters. This allows it to cover English, Spanish, French, Portuguese, German and Dutch. It also supports punctuation and some special symbols such as the music note and the copyright symbol. CEA-608 does not support Unicode characters, so it cannot display characters in Chinese, Japanese, etc.
Most analog (e.g. VHS, BetaSP) and full raster digital (e.g. DigiBeta, IMX, D-1) formats store the CEA-608 caption data in line 21, which is a line of video in the VBI (Vertical Blanking Interval). Line 21 is outside of the viewable area of the video image, but still behaves like part of the image. The caption data is represented as white dashes which blink on and off, similar to Morse code. These dashes convey the 1s and 0s which make up the encoded CEA-608 caption data. This data is very sensitive to dropped frames and other kinds of video distortion, which may cause the data to be unintelligible to the decoder.
Some non-full raster digital formats such as DV (miniDV, DVCAM, DVCPRO) and DVD do not store the VBI data, so they do not have a line 21. Instead, they store the CEA-608 captions separately as metadata packets in the digital data stream. Many (but not all) devices will convert these packets into a line 21 when playing the data back through the analog outputs, and vice versa when recording.
Analog broadcasts were transmitted with line 21 as part of the video signal. However, as of June 2009, virtually all broadcasts in North America have switched over to digital (DTV).
For DTV broadcasts, the caption data is stored in metadata packets as part of the video stream, which is itself contained in the MPEG-2 transport stream. These captions must be decoded by the device that receives the DTV transmission, because once the image has been decoded, the metadata packets do not travel along with it. This will usually be the TV itself when using an antenna connection, or the cable/satellite/converter set-top box (STB) which is then connected to the TV by a baseband connection like HDMI or component video. Some STBs will re-generate the line 21 closed captions for standard definition analog connections such as composite or RF, so that the TV's decoder may be used instead of the STB's decoder. This is only possible for standard definition video.
The line 21 which represents the closed captioning data can be added to any full raster standard definition video. Traditionally this was done by a hardware encoder. Now, it can also be inserted directly into a 720x486 video file using closed captioning software, or by taking a "black movie" (a video file containing only line 21 data) generated by a closed captioning service and superimposing it over an existing video. A black movie needs to be generated with specific Row and Column settings depending on the NLE system that it will be used on. Different NLE systems and hardware cards map the line 21 of VBI to a slightly different location within the 720x486 video file, so the Row and Column must be set to match. If the data is in the wrong Row or Column, it might not correctly get mapped to line 21. You can determine the correct Row and Column settings to use by using CPC's calibration movie: http://www.cpcweb.com/blackmovie
In many cases, the captions will not be preserved. This is because DVD and web formats store the captions differently than the line 21 used for standard definition. However, the line 21 captions can easily be extracted and converted into the proper format using closed captioning software.
It depends on what software or hardware is used to perform the conversion. In general, captions will not be preserved unless the system was specifically designed with the capability to preserve the closed captions. Some examples of hardware that were designed to do this are the Matrox MXO2, and the AJA FS-1. Another option is to use software to extract the captions from the original source and convert and re-encode them into the destination format.
The data for a closed caption is encoded in the video signal a few seconds before the caption actually appears. If you make a cut, the caption data could be interrupted for 3 seconds or more, both before and after the cut. This can cause the first caption after each edit point to fail to appear. If making a cut in the captioned dialog cannot be avoided, you should use closed captioning software to extract the original captions and generate a new seamless caption track.
As long as the line 21 is left untouched and the video is kept in its original format, the caption data will basically stay intact and in sync with the video it is attached to. However, any changes to the video which also affect the line 21 (e.g. color correction or effects applied to the whole frame) can destroy the line 21 data. The caption data can also be lost if the video is exported or transcoded into a different format, unless special precautions are taken to preserve the caption data.
CC2 and CC4 share bandwidth with CC1 and CC3 respectively, and CC1 and CC3 receive priority when there is not enough bandwidth to encode both caption streams. As a result, the captions in CC2 or CC4 may be delayed until a point at which there is sufficient bandwidth remaining to encode them.