Immersive Audio Expands its Reach

June - Hot Button Discussion

Immersive Audio Expands its Reach

by Michael Goldman

The entertainment industry’s ongoing drive to create deeper, richer audio experiences in movie theaters via the use of immersive systems and techniques has, like everything else, taken a hit this year with the virtual shutdown of the movie exhibition industry due to the COVID-19 pandemic. The cinematic standstill happened, unfortunately, precisely at a time in history when the industry finally had a mature immersive sound suite of standards in the form of ST 2098, and three robust object-based theatrical sound systems—Dolby ATMOS, DTS:X, and Barco’s AURO—that theater owners were relying on and consumers were getting used to. 

However, many industry professionals suggest the real key to making immersive audio truly ubiquitous lies outside of cinemas anyway. The idea, they suggest, is to make immersive audio more than a movie-going novelty. For them, the industry’s work to develop immersive experiences for home and even viewing on personal devices is the true goal.

On the consumer side, of course, there are all sorts of products available that try to push richer audio experiences into homes, such as strategically placed Soundbars, which offer larger or longer speaker enclosures to project audio in a wider fashion throughout a room to emulate a surround-style experience; or strategic design of home theaters with more, bigger, and better speakers, and so on. These approaches are nothing new. However, the idea of recording, mastering, and distributing audio content with true immersive qualities that can be detected and appreciated in a wide range of environments and on a wide range of devices is making significant advances.

The reason the industry is pursuing such goals is, quite simply, because “immersive sound offers a lot of possibilities to tell a better story,” suggests Iris Wu, founder of Ambidio, a sound company that offers the industry a software-based solution for creating a sense of immersive depth that can be comprehended by people listening through any device with stereo speakers. Wu says her company pursued this initiative because, even under the best of conditions, most consumers will not typically hear a studio’s highest-end immersive sound mix in a cinema, simply because “most films will have no more than a few months in a theater, and after that, it will be home entertainment anyway.”

“People want to experience the full story, the very best image quality and the best audio quality, and they can [often] get that inside a cinema,” she says. “But if you think about the lifespan of a title, 90-percent of the time it will likely be watched outside of a theater. If you agree with the idea that sound is 50 percent of the story, because sound helps on the emotional level to get inside a story, then a lot of information is being lost [on down-mixes for non-immersive platforms outside the cinema]. That means you are only experiencing part of the sonic portion of the story if you are watching on a smaller device with just two speakers—it simply defaults to stereo. I think there must be a way we can bring that part of the story back.”

Ambidio’s approach is built around a psycho-acoustic sound perception model utilizing a core algorithm for sound editing technology such as Pro Tools. The software builds “an immersive sound field through stereo speakers by inserting common cues into audio files that the brain uses to identify and perceive where different sounds within the audio stream would come from,” Wu explains. “The idea is to treat the brain as a decoder, so that you can process the sound as you would in everyday life.”

Of course, that’s just one of the exciting new approaches dotting the landscape. All manner of new software tools and plugins are now flooding the market to help sound designers add immersive elements or qualities to content. Wu points, for example, to tools like Sound Particles—an immersive 3D-style application to help build virtual 3D environments.

“Tools like that are cool, because they let you automatically generate thousands of ambient sounds, where you can define the space you want and even animate sound particles,” Wu says. “If you want to create an airport scene, for example, you used to have to import lots of different sounds and then put them together, and it took a lot of time. Now, you can import just a little bit of material and then define how large a sound space you want, what EQ, compression, and other qualities you want for the space, and generate it much quicker. That’s a great tool for designers creating things like explosions, ambiance, Walla, and other sounds.”

Beyond various cool new software tools, methodologies for building immersive tracks these days range from the use of binaural audio, intended to create a 3D experience specifically related to how human ears detect and process sound on each side of the head, and other transfer function techniques that permit the monitoring and adjustment of sound signals while a system is in use; and more recently, the use of AI-based spatial audio algorithms and various virtual-reality-based techniques for mixing and mastering audio in new ways. Such techniques rely on VR tools like goggles, headsets, and various kinds of boxes that allow artists to plug in headsets and then track head movement so that sound turns and pans as the listener’s head moves.

“The industry now has VR-based tools to help with creativity—a pair of goggles where you can literally point to where you want the sound to be [during mixing],” Wu says. “I know people are experimenting in different ways with mixing immersive sound to make the whole thing more intuitive for mixers. In other interactive setups, maybe you don’t wear goggles but you have a special laser pointer and use it to point at speakers and describe where the sound should come from.”

Wu points out that such exciting developments illustrate that “the technology is there” in terms of allowing the industry to “create cool immersive stuff—high-quality stuff.” However, there remains the question of “putting all the puzzle pieces together” in terms of how best to record elements, package and manage metadata, compress and uncompress signals, and allow consumers to hear immersive material to its maximum effect.

Key to all of that is how, for example, studio music content is recorded, scored, mixed, edited, and mastered. Erin Michael Rettig, supervising stage engineer at Fox’s Newman Scoring Stage in Los Angeles, is involved with that end of the industry. He suggests that, so far, for high-end facilities like the Newman Stage, the arrival of immersive sound tracks hasn’t required wholesale technological changes as much as subtle improvements and new ways of thinking.

“Obviously, our stage is where all the microphones and tie-lines originate, and for us, microphone technology is a fairly significant part of the business and a challenge for us to keep up with,” Rettig says. “And then, the first thing microphones plug into are mic pre-amps, so we have a fairly large collection of channels of mic pres. Keeping those functioning and updated and repaired is a challenge just because of the sheer number of analog channels we have to maintain. And then, we have the console. In our case, we have one of the larger analog consoles that still exist on the West Coast—an AMS-Neve 88RS recording console. We also fairly recently upgraded our control room with a [Meyer sound Bluehorn monitoring system]. On the recording side, we tend to operate at very high sample rates, with some clients asking us to record as high as 96 or 192kHz. So stable high sample rates are something we have to support and be able to record. Then, there are things like clock stability, our interfaces, and recording data storage that we have to worry about as important components.

“So by maintaining all of those things, we already have the flexibility to generate multi-channel stems in our environment. For a scoring stage like ours specifically, we are mostly responsible for recording the tracks, so changes in the business are such that we now have to record more channels. Scoring mixers will put up more microphones in new positions than they would have in the past, so those tracks integrate correctly when heard in an overhead array or inside an object channel. Generally, this means more microphones, additional channels, more tracks, and additional complexity in terms of monitoring. And, after the fact, if you are doing a true music pre-dub going toward a final mix, then you need a system that has the ability to put those objects into overhead speakers. These are all things important for us and our clients to be able to understand and hear the entire audio spectrum, tell exactly what sound is happening, and make sure it all translates accurately and easily.”

However, Rettig adds, the eventual maturation of immersive audio as a commonplace feature of sound tracks will likely also require new and sophisticated thinking about how some facilities and their workflows will be configured going forward.

“Our control room right now is configured such that the mix position is optimized for 7.1 monitoring,” he explains. “While our control room is appropriate for tracking and mixing in 7.1, if clients require mixing for Dolby ATMOS or other immersive formats, we have mix rooms in the building next door on the Fox lot, which would be [best suited] for handling that task.”

But Rettig also cautions that the immersive audio push may eventually run into its own limitations in the sense that while creating content with more and more immersive audio channels will become increasingly possible as time goes on, there could be a tradeoff where audio spatial fidelity and imaging are concerned.

“I suppose you would eventually run into the law of diminishing returns,” he relates. “How much additional effort do you put into it compared to how much benefit you will get back for that effort? In that sense, I don’t think it’s limitless. There will come a point where, even if we add 100 more microphones, no one will be able to tell the difference—not to mention we probably won’t have anywhere to put them on an analog console.”

Plus, he elaborates there is the reality that for the time being anyway, the art of mixing tracks down could likewise also bump into the law of diminishing returns as increasing numbers of immersive tracks are routinely mixed down to satisfy various other distribution formats.

“If you start with an immersive ATMOS master and then mix it down to 7.1 and then 5.1, and then stereo, or even a VR or AR track—as you collapse all those channels and start summing channels into smaller numbers of channels, then obviously that will change the way that it sounds,” he says. “That’s because the summing and coherence of the tracks affects the final product.”

All of which returns the conversation to efforts by Wu and many like-minded people across the industry to find ways to essentially emulate pristine immersive experiences for those listening on a wide range of devices and platforms. Thus, all sorts of esoteric questions about not only the best way to create immersive audio tracks, but also how best to use them, are abounding on the landscape.

Wu expects this reality to continue for some time, because despite the quick maturation of technology and techniques for this sort of work, “we are still very much in an experimental stage.”

“That’s why it’s more important to train people on how to use the technology creatively than how well they understand how to use particular software,” she suggests. “It’s sort of like when stereo first came along in the music industry. Suddenly, we got the Beach Boys and the Doors doing hard panning and pushing all creative boundaries with audio. Then, after that, the industry merged into a phase where it was understood that it could work with a common understanding of what a balanced stereo image is, and that’s when stereo became more of a household standard. A similar thing happened when surround sound came along. Some really crazy things happened, some of which distracted people. And then people figured out how to use it creatively in a mainstream way to support the story. That’s the phase immersive sound is in at the moment—an exciting momentum change.”

News Briefs

FCC Committee Statement on Racism

The FCC’s Advisory Committee on Diversity and Digital Empowerment recently issued a formal statement acknowledging that systemic racism exists in the communications, media, and technology industries. The statement, issued in light of racial tensions wracking the nation recently, said the committee would work toward developing, recommending, and implementing policies and procedures aimed at diversifying the industry’s workforce, providing more ownership opportunities for minority business people throughout the industry, and try to accelerate the availability of universal broadband to rural and minority neighborhoods, among other goals.

White Paper Outlines Industry Restart Protocols

A recent Pro Video Coalition report outlines key points made in a new, 22-page white paper released by the Alliance of Motion Picture and Television Producers, meant to serve as what the article calls “an evolving guideline for the industry to get back to work.” The white paper was produced by a special Industry-Wide Labor-Management Safety Committee Task Force at the behest of the Alliance, in partnership with various industry labor unions. It offers up guidelines based on input from health experts, the U.S. Centers for Disease Control and Prevention (CDC), and the Occupational Safety and Health Administration (OSHA), as well as input from industry experts who understand the specific nature of working conditions for movie and TV production. Among other things, the white paper discusses improved hygiene on sets, how bathrooms should be cleaned, and new processes such as staggered call times and the end of buffet-style craft services, among other proposals.