SMPTE Newswatch Masthead

Hot Button Discussion

Implementing Assistive Technologies
By Michael Goldman  

Since SMPTE Newswatch last examined the topic of closed captioning and other accessibility technologies a couple of years ago, not much has changed in terms of governmental regulatory requirements on broadcasters to widen access to modern communication technologies. Indeed, the only major recent action taken by the FCC regarding accessibility related to the expansion of rules regarding how to get critical emergency information to consumers with visual impairments by making that information accessible on their so-called “second screen” personal assistive devices. However, since the Twenty-First Century Communications and Video Accessibility Act of 2010 was passed, the media industry has steadfastly been seeking ways to make captioning, video description, and other enhancements more consistently available with their content across all platforms. In fact, the action in this space right now appears to be focused mainly around how to most efficiently implement the FCC’s requirements across an industry that “broadcasts” content just about everywhere, to everyone, using both traditional and non-traditional methods, and delivery and viewing systems.
 
As discussed previously in Newswatch, the traditional television broadcast industry has remained stable and efficient in terms of providing closed captions by adhering to the established captioning standards, CEA-608, and its digital television descendant mandated by the FCC, CEA-708. Methodology-wise, television broadcasters continue to author captions in the CEA-608 format, and put them through a transcoding process to convert them into the 708 format as the final step in the broadcast chain. This methodology is used because 708 has never been “natively” adopted by the caption authoring industry as a wholesale replacement, since most archival content, hardware, and software infrastructure remains based on 608.

It is, however, “an interesting question” how changes in broadcast television picture creation, transmission, processing, and viewing due to the industry’s ongoing ultra-high-definition (UHD) transition could impact captions for broadcast content, including the integration of broadband delivery, suggests Michael Dolan, founder of the Television Broadcast Technology Consulting Group, chairman of the ATSC Technology and Standards Group 1, chair of SMPTE Working Group 24-TB, and a SMPTE Fellow. But Dolan suggests that this evolution to UHD and broadband delivery provides an opportunity to introduce new caption technology along the way.
 
“Caption systems today already support at least eight colors—some of them more—and there does not seem to be any requirement from the authoring community for a broader set of colors than what is available today, unlike video, where you are trying to provide very smooth transitions between shades of all the different colors, and a wider color gamut and higher bit depth make a remarkable difference to the viewing experience,” Dolan explains. “When it comes to captions, I’m not aware of a requirement where you would need or want to make two subtle shades of red, for instance. That simply wouldn’t serve the purpose of helping the hard-of-hearing person discriminate text for different speakers or sound effects. However, it would complicate the decoder mixing to have two color models in play, so as you move to higher dynamic range, wider color gamut in video, ultimately the captions have to be easily composited into the video plane. And that process can get a little more complicated when you are working with one color model for the video and another for the text. So one would expect enhancements to caption technology to facilitate this [in the future], even if more colors are not needed.”

Meanwhile, in the increasingly busy commercial content streaming space, the industry has been turning to the SMPTE Timed Text (SMPTE-TT) format for broadband distribution of captions. Since the FCC formally declared SMPTE-TT as a so-called “safe harbor,” meaning commercial broadcasters who used it would be considered compliant with the law now and for the foreseeable future, the industry “has really taken that to heart, but they have had to examine on a technical level what that means exactly,” Dolan explains.
 
By that, Dolan means that after the FCC’s declaration that SMPTE-TT was the way to go, the industry had to get to work trying to find ways to coalesce around a common profile of SMPTE-TT as the standard choice for captioning commercial streaming video content. This is an important step since, until recently, captioning had existed across the Web pretty much in a hodge-podge of formats and systems. In this regard, getting both commercial and Web content to converge around a common profile remains a work in progress, Dolan suggests.
 
“Some time ago, the UltraViolet industry forum created a profile of SMPTE Timed Text, because it is a rather large set of technologies, not all of which are needed to do a good job on captions and movie subtitles specifically,” he says. “That profile did a good job for captions, and it formed the basis of a new initiative by the W3C [Worldwide Web Consortium] with the profile known as IMSC1 [Internet Media Subtitles and Captions 1.0]. That is now close to publication, and more and more folks are looking at adopting it as the profile for the safe harbor version of SMPTE Timed Text. Right now, there are reference implementations underway.

“There are a number of commercial media delivery silos on the Internet that are using some profile of [SMPTE Timed Text] already, but most of them do not disclose what they are doing exactly, so it is a little difficult to talk about who is adopting it and who isn’t, other than to say that many programmers who deliver content to tablets and other ‘second-screen’ devices are using a version of it when they deliver their content.” 
 
However, Dolan quickly adds that the volume of programmers and content, and the rapidly evolving nature of the Internet, combined with the typical nature of what it takes to roll out a new technology or standard even under the best of circumstances, means it will take a long time to coalesce broadcasters around a common profile such as IMSC1 in terms of standardizing caption formatting. For one thing, some software developers and Web browser companies have gravitated toward another option—WebVTT. That methodology relies on a simpler markup language built on Subtitle Resource Tracks (SRT), and has become popular for captioning some types of Web-based videos.  
 
And for another thing, according to Dolan, major commercial content streaming services like Netflix, Amazon Prime, and others were well into development of their own proprietary processes before the industry got around to pushing toward standardizing commercial media delivery on the Web.
 
“They are still converting not only video and audio, but also captions to whatever they have already designed for their silos, and much of that pre-dates a lot of the work over the last few years with respect to captions, certainly,” he says. “Some of them are moving in the direction [of SMPTE Timed Text] and some aren’t—it’s really on a case-by-case basis.
 
“So a lot of progress has been made. But has everyone converted to a single format or fully deployed IMSC1? No. But there has been a lot of work put forward and a lot of activities are going on that are starting to adopt IMSC1, both in standards’ bodies and in commercial silos. It’s a process, but we are not even close to a common format, that’s for sure.”

Broadcast, of course, is not the only content delivery area where assistive technology is required, nor are captions the only area where there have been interesting developments in this category. In the world of digital cinema, for instance, captions are a relatively stable topic. DCI distributions now include closed-caption standards built around an Ethernet-based synchronization protocol, associated resource presentation list, and a content essence format that permits content creators to distribute DCI versions of their movies with up to six languages of interoperable closed captions associated with them. The industry also has a standardized protocol for how digital cinema servers talk to captioning devices, as well as well-established standards for descriptive audio in place that are carried in DCI packages. Further, as Dolan points out, the Interoperable Mastering Format (IMF) has “already embraced IMSC1” so new studio movies will typically be mastered to be optimized for streaming platforms going forward.
 
At the same time, manufacturers have been making interesting strides regarding how to make such assistive technologies practical in the cinema space. When it comes to the issue of descriptive audio—that is, a separate audio track designed to describe or narrate what is happening in the picture to assist visually impaired viewers—hardware manufacturers have been offering a variety of solutions in recent years. For cinema applications, companies like Dolby, Sony, and USL, among others, are offering a range of technologies that provide closed captions to individual consumers on small personal devices, or audio signals through small, wireless RF receivers attached to standard headphones worn by impaired moviegoers. 

And for home viewers, “the methods of carrying descriptive audio have been mature for some time,” says Sripal Mehta, principal architect, broadcast, for Dolby Laboratories and co-designer, along with Harold Hallikainen, of the digital cinema closed caption communication protocol standard described above. “In some cases, a separate audio program with descriptive video mixed in is sent as an alternate sound program to the main audio program. The issue with this is that, in many cases, the main program audio is stereo or 5.1, while the descriptive video track may only be mono or stereo. Another method is to send a separate descriptive video track, which would be mixed, at playback time, with the main video. The benefit of this approach is that the visually impaired viewer gets the full surround experience, as opposed to a compromised stereo or mono experience. The Dolby encoding/decoding system takes care of ‘ducking,’ or reducing the volume of the main audio track when the descriptive video track dialogue is presented.”

Mehta adds that descriptive audio has become “a standard part of [Dolby’s] offerings, and is being adopted by our consumer electronics partners, as well as broadcasters,” and he suggests this trend is proliferating across the industry. And that’s not the only evolution in the assistive technology space in the broadcast world. He adds that another paradigm shift includes the shifting of descriptive audio tracks into the element-based, or object-based audio delivery world.
 
“With object-based audio, music and effects, dialogue, and descriptive video are sent as separate elements, and are mixed together at playback time,” Mehta says. “This method delivers a premium experience to each listener of every need, provides the ability to adjust dialogue level for increased intelligibility, and reduces the overall bit rate for different experiences.”
 
And related to the notion of “increased intelligibility” is the growing push toward what Mehta calls “dialogue enhancement” as another application to assist hearing-impaired consumers.
 
“That’s the ability to pick out dialogue from the ambience of the content,” he says. “Next generation audio codecs, including Dolby AC-4, support dialogue enhancement, which involves advanced signal processing to improve the audibility and intelligibility of dialogue for both pre-mixed stereo and 5.1 audio programs, as well as object-based audio. Dialogue enhancement is a valuable feature for those who are hard-of-hearing.”

News Briefs

ITU OK's Immersive Audio Standard
As reported recently by TV Technology, the ITU recently announced approval of Recommendation ITU-R BS.2088-0, which essentially is an open audio standard designed to make feasible immersive broadcast sound experiences in combination with ultra-high-definition TV (UHDTV) pictures. The recommendation, which you can read here, is based on existing Resource Interchange File Format (RIFF) and WAVE audio formats, and codifies standards that will allow single files to carry entire audio programs and metadata for all combinations of channel-based, object-based, and scene-based audio available for those programs. When implemented for users who have the right technology in their homes, the idea is to permit them “to adjust the level of immersive audio” on UHD programs, according to the article.