Newswatch e-newsletter

Current Issue - June 2015

SMPTE Newswatch Masthead

Hot Button Discussion

Pushing Cinema Sound Systems Into the Future
By Michael Goldman  

The “thickest” part of the work currently being conducted by the SMPTE Technology Committee on Cinema Sound, TC-25CSS, is the ongoing initiative on the interoperability of immersive sound systems for cinemas, according to Brian Vessa, chairman of the technology committee and also executive director of digital audio mastering at Sony Pictures Entertainment. That work is a reaction to an audio technology revolution from a small group of companies that eventually caused the industry to conclude, as Newswatch noted last year, “that a single specification for the packaging, distribution, and theatrical playback of D-Cinema-based audio tracks that pushes past what was initially described in the original Digital Cinema Initiative specification” was of crucial importance.
At the same time, Vessa adds, that is not the only crucial work being conducted by the technology committee regarding cinema sound. He points out a longstanding industry problem that festered long before anyone thought of immersive audio is the problem of “the inconsistency between theaters, even if you are not adding the immersive sound component, just for a regular 5.1 or 7.1 theater. You find there is a big difference between what you create on the mixing stage and deliver to different cinemas, and what you hear in those cinemas, which is all over the map. So we have also been working on that fundamental issue.”

These parallel efforts combined, Vessa suggests, essentially involve the pursuit of “an end-to-end process for an entire system—an effort at maybe a higher level than anything SMPTE has ever attempted to do. After all, we are trying to put together all the pieces that are necessary to take [cinema audio] from where it has been in traditional cinema for years, and get it to the point where we can deal with it in standardized fashion with an entirely new audio idea [with immersive audio].”

That notion of working through “an entire system” is why Vessa suggests it is important to understand what the committee has been doing to improve the consistency and quality of cinema audio generally before examining the major strides in pushing toward an immersive audio standard. In that regard, the first big steps have been to pursue recommended practices for measurement and calibration of B-chain sound systems using modern standards and measurement technology, and the creation of a standard pink noise test signal.
“Keep in mind, the last time we created cinema standards dealing with cinema calibrations was in the 1970s,” he says. “That’s when we created SMPTE ST 202, which describes the calibration of electro-acoustic frequency response, and SMPTE RP 200, which describes sound pressure levels and related [issues]. There have been updates, but the basis dates back 40 years. Thus, movie theaters have essentially been working on a calibration frequency response curve that was established 40 years ago. Therefore, we thought it was time to revisit this. And the plan for doing that could not involve just handing off new standards to the theater owners, since thousands of theaters are calibrated according to the old standards. 

“The first thing we did was devise a plan to study it, and we published a report analyzing theaters and dubbing stages [find that report and the committee’s study group report on immersive audio systems, along with other recent SMPTE reports here]. Then, we realized there was no standard for the test signal that the cinema industry uses, referred to as digital pink noise. In the old days of film, people used to use test films, which were created by laboratories and sent out to calibrate all theaters—everybody had the same test material to work with. But since the advent of digital exhibition, it has not gone that way—no one ever came up with a standard for digital pink noise. So we worked to standardize that, and we are just about done and expect to publish that standard by the fall.

“The next step involved creating a modern calibration procedure, a recommended practice for how to use the current standards—ST 202 and RP 200—trying to delineate exactly how everyone should go about calibrating against those standards with modern equipment, so everyone does it the same way in order to create consistency. We are targeting publishing that recommended practice by the end of the year.”
Vessa emphasizes that once a recommended practice for calibration procedures and a specification for a standardized pink noise signal have been achieved, it will then be time for the committee “to take things to the next level,” meaning, they will start work on system performance specifications for what the B-chain part of a theatrical system really needs to deliver to the auditorium.
“Right now, there are no standards in place regarding the performance of cinema systems—nothing that says what they are supposed to do,” Vessa says. “You can’t really start pushing the envelope with what we expect cinema sound to be if we can’t come up with some idea of how cinema sound systems are supposed to perform. We intend to embark on this, probably in the fall. To do that, we have to determine what acoustics in a theater are supposed to be, and what the entire B-chain system needs to do to meet the current expectations of modern movie soundtracks. Once we figure that out and establish a baseline, we think we could push the calibrated response of the systems up to the next notch. You can’t do that today, because many systems today do not even meet the current [older] specifications. Interestingly, some B-chain systems are actually capable of well exceeding those specifications, but are calibrated below what they can actually achieve in order to conform to the current standards.
“But by examining all the elements within the B-chain, having uniform calibration methods, and so on, we can do that. And people forget that one of the most important aspects of a sound system in a theater is the screen, which is essentially a filter in front of the speakers from a sound perspective. So the screen, the routers, the speakers, the equalizers, amplifiers, processors—everything in the theater after the cinema projector or server sends audio out to be played back in the room—that is the B-chain. We have to include all those [components] in this conversation in order to determine system performance specifications. So far, none of this has ever been written down, so that is our next challenge.”

Meanwhile, concurrent to these developments, are the strides the committee has taken with immersive sound, specifically. Vessa explains that when Dolby, with its ATMOS system; Barco, with its AURO system; and more recently DTS, with its DTS:X system entered the fray, they brought entirely proprietary soup-to-nuts methodologies for playing out immersive sound in modern digital cinemas. This has created great excitement in the industry, allowing new creative possibilities. Theater owners have been able to create premium value presentations, but in fact, the result has been a total absence of interoperability, he says. Such auditoriums must commit to one format or the other and this, in turn, raises costs and limits the widespread adoption of immersive sound across the board. Therefore, the industry coalesced around the notion of developing an object-based immersive audio standard, built, in large part, around proposals by the companies who have developed the currently available systems.
“We’ve been tasked with a big job,” Vessa says. “For traditional digital cinema [audio], you take a DCP, put it in a server, decode it, and it plays out to speakers, which was a fairly straightforward process that allowed them to get going with digital cinema when it first started. The industry is still largely using what was called the Interop-DCP [legacy standard] package, rather than the SMPTE DCP package, because it is very straightforward, with few complications—from an audio perspective, it’s simple. That is what the industry has used since about 2006, even though we have been busy writing lots of SMPTE standards for picture, audio, and many other things.  Regardless of whether it is an Interop or SMPTE DCP that is delivered, it has been channel-based audio, and that is what all digital cinemas understand.
“Now, immersive audio comes along, and several companies did it with object-based systems, which we don’t have any standardized way to deliver [within the D-Cinema environment]. For example, Dolby created an entirely proprietary process. With no standards to rely on, they were extremely specific in what they invented, and it works very well, to their credit—but only within the proprietary ecosystem they created. In the current SMPTE work, we want to provide interoperability, but in doing so, people think it is just about the digital audio file that is delivered in the DCP. We have come to realize that there has to be a whole standardization process for how audio is packaged, how it gets put into a DCP, how it gets ingested into a server, how it gets played out of a server, how it goes into the renderer, how synchronization is achieved, and so on.”
Thus, Vessa states, the committee has to develop a standards process not only for file formats, but also for what he calls “the entire mechanism—both halves of the problem.”

The “mechanism” is part of the larger infrastructure of immersive sound systems, and the big, first step in finding a way to insert new pieces of an immersive system’s infrastructure into existing digital cinema architecture was the creation of what is called an “aux (for auxiliary) data track.” Vessa calls this “an additional track that you can have in a DCP that can [potentially] accommodate any type of data, frame-wrapped and synchronizable using the D-Cinema composition play list (CPL). And, with it, is an identifier, which is registered, and thus can be identified by the decoder and played.”
“We had to decide how to get that from the server out to the renderer, which is the device that takes the immersive audio and determines how to play it out to the speakers based on instructions contained in the immersive audio metadata. We needed a protocol for that and for how to synchronize it all together. There were quite a bit of issues in determining the infrastructure, but now, a lot of that is essentially done—the updated D-Cinema infrastructure is nearly standardized.”

Additionally, Vessa points out that, in 2014, Digital Cinema Initiatives (DCI) addressed the issue of how to securely play object-based audio out of servers and through digital projectors in auditoriums in a way that mimics the current DCI Digital Cinema System Specification (DCSS) security architecture. The DCSS has always been based on a single Image Media Block (IMB) in which all audio, captions, sub-titles and other data synchronized to picture is processed and spit out of a single server. Vessa says the DCI’s alternative architecture, called Multiple Media Block (MMB), allows, as the name suggests, for multiple Outboard Media Blocks (OMB) to be used, with “the first-use case for that OMB being a media block dedicated to object-based audio. DCI has created a secure architecture to play object-based audio [within the DCI infrastructure] whereby, the object-based audio is ingested directly into the OMB, which then receives a Key Delivery Message (KDM) decryption key directly and securely from the Screen Management System (SMS) to decrypt the contents and perform forensic watermarking in the OMB, just like the DCI requires. The OMB is synchronized with the IMB via a server-generated sync signal, which is what SMPTE has been standardizing. Before, this workflow wasn’t possible, which is one reason why [manufacturers] had to do everything in a proprietary way.”

With these kinds of developments in recent months, “the building blocks are in place” to permit the creation of a standardized file format for the playback of object-based audio that should eventually lead to the Holy Grail of interoperability, regardless of what system a theater happens to install, Vessa says.
“That’s the next part—how to boil all the channels and objects down into some kind of standard delivery file that everyone can read,” Vessa says. “We want to publish a standard about what that file should be so that every system can read it. That is in process, we have been at it for many months, and it will take a while.”
That is partly because all this is complicated, technically, even under the best circumstances, and partly because there are competing file formats and processes, most notably Dolby’s ATMOS and DTS’ MDA, and the industry has chosen to proceed slowly and deliberately as these options are debated and resolved. Further, Vessa suggests, even with basic infrastructure questions now resolved, many questions remain, such as how best to offer the industry a set of expectations for the renderers that each system requires to interpret and address all instructions found in metadata coming down the bit stream from the theater’s server.

“Since the same mix is expected to be rendered into speaker systems and rooms that might be very different, it is important to set reasonable expectations for what that means technically and sonically,” Vessa explains. “As noted, even 5.1 and 7.1 mixes do not translate perfectly from room to room, and those are one-channel to one-loudspeaker [or loudspeaker array] concepts. Rendering a single immersive audio mix convincingly into multiple, and potentially, substantially different playback systems is a bit more complicated. So we really need to manage expectations up front for both the filmmakers and the equipment manufacturers.
“For the renderer, right now, the suggestion is to write a recommended practice about how a renderer should basically behave—what it should do when presented with particular instructions,” he says. “We don’t exactly want to standardize the ‘how’ because that is each manufacturer’s secret sauce, but we do want to say that if we tell the renderer to do something specific, our reasonable expectation is that it will do something like ‘this.’ ”
As Vessa noted at the outset, all these matters make immersive audio and standards a “thick” topic that will take quite some time to settle, and even longer to fully roll out to the industry in a comprehensive way. But, at end of the day, he is confident it will happen and that the business of digital cinema will improve as a result, for the simple reason that it will eventually make things easier for all concerned parties.
“It will make sense for the studios, because they will only have to create and deliver one [package] instead making [multiple] distributions,” he says. “That makes studio distribution people happy, because right now, DCP inventories are a tremendous headache for studios. Their inventories have gone up tenfold with all the different available formats [on the picture side], and now, they have to throw immersive audio in, as well? It also helps the post-production process, because you only have to create one mix. It helps theater owners, because regardless of the system they purchase, at least they can play all the content being delivered. It helps manufacturers, because they will now have standards for designing, which will allow them to push the envelope with new and cost-effective designs. It’s quite a bit like 3D image:  It was slow to get off the ground until there was a standard for how to deliver it—now, the studios deliver one 3D image file, and any theater with any 3D projection system can play it.  This has allowed that part of the industry to grow by leaps and bounds, and has fostered innovation and good competition, with a number of high-quality 3D systems being introduced into the market. Similarly, the idea is that the immersive audio standards coming in will be the grease to get the whole process off the ground in a big way.”

News Briefs

More NHK 8K on the Way
In early June, Japanese public broadcaster NHK, in partnership with Fox Sports and international soccer governing body FIFA, broadcast live Ultra HD/8K Women’s World Cup soccer matches from Canada to the Zanuck Theater on the 20th Century Fox lot in Los Angeles, as well as other sites in New York and Japan. The point of the demonstrations was to illustrate NHK’s ability to broadcast NHK’s Super Hi-Vision format (technically a format that combines 8K imagery at 60 fps with a 22.2 surround sound system) internationally in advance of the network’s stated goal of broadcasting the entire 2020 Tokyo Olympics that way as a launching point for 8K live-event broadcasting in Japan. In combination with those demos, NHK announced that it also plans to experiment by recording several New York Yankee baseball games this fall, as well as the 2016 Super Bowl, according to a recent Hollywood Reporter article. The article states that NHK will be recording those events in Super Hi Vision, not broadcasting them live, in addition to doing some additional, limited, live satellite broadcasting of selected events from the Rio Olympics next summer, as part of an ongoing strategy to R&D, evolve, and improve the format in time for its big 2020 debut. NHK also plans to film concerts and make documentaries in this format, and to broadcast the 2018 Men’s World Cup in Super Hi-Vision, as well. However, the current FIFA bribery scandal may impact whether that event takes place. Currently, for such events, NHK is utilizing the Ikegami 8K UHD field production camera system, which debuted at NAB earlier this year.  

Legendary Cameras
Despite the constant flow of news about the latest digital cameras and sensors and how they are changing the world of cinema, some people are taking time to remember, preserve, protect, and pay homage to the legendary cameras of yesteryear, as writer David Heuring periodically points out in his Parallax View column on the American Society of Cinematographers website. For example, Heuring recently wrote an article describing insight he received from cinematographer Roy Wagner, ASC, after Wagner spent time authenticating two original VistaVision cameras that were up for auction. Heuring points out, VistaVision was an 8-perf, horizontal, 35mm format developed in the 1950s that was originally used in movie production, and then, for a time, became a visual-effects’ workhorse. Wagner actually examined the historically significant VistaVision camera No. 1 and, Heuring reports, and found it to be intact. This is quite unusual, considering that many believed most of the 30 cameras that were manufactured were disassembled, so their parts could be used in other equipment. Heuring also posted an article in 2013, about legendary visual effects supervisor Richard Edlund's, ASC, quest to finish the camera (No. D-18, to be exact). Edlund was responsible for robbing the camera of its movement some 40 years ago while working on the original Star Wars, but kept the body and spent four decades hunting for parts to restore it to its original condition. In April of this year, he finished the job and now owns the only original three-strip Technicolor camera in the world in operating condition, so Heuring interviewed him about that journey. 

Virtual Reality for Consumers
Illustrating just how serious an evolving entertainment business virtual reality has become, the long-awaited and highly anticipated consumer version of the Oculus Rift VR headset was finally released in early June at a press conference in which the company also announced a partnership with Microsoft to make the wireless Xbox One controller and adapter compatible with the Oculus Rift headset, meaning users will be able to stream their Xbox games to the headset. Additional game developers for the platform were announced, and the company stated it will be investing $10-million to speed up independent game development for the technology. A slew of reviews of the headset and related controls immediately followed, including this one from the respected journal, the MIT Technology Review. And then Forbes offered up some analysis of the development. Combined, the trend of the analysis so far seems to suggest that, at a minimum, Oculus Rift and the competitors who are sure to follow, illustrate that, as Forbes suggests, “heavy-hitting gaming action” in streamlined, less bulky, user-friendly VR-style environments is on the way, and therefore, so is the potential for some big business for major technology companies in the entertainment space.