Switching from video to audio, Sunil Bharitkar from HP led a panel on the challenges and future direction of audio as part of entertainment. Underlying the changes is a massive upgrade in the delivery technologies for audio, from VHS to DVD and now potentially very-high-bandwidth internet connections. The devices used to play the content has also improved in general, from mono to stereo, to mutlichannel. Mobile devices were a step backwards in many cases, but even those now have access to fairly-high quality audio reproduction. VR represents the next frontier, as spatial audio is a key component of realistic immersive experiences. 

Robert Fisher, Warner Brothers, traced the history of high-quality sound production from studio to cinema, and explained how they have been further extending their production pipeline to accommodate streaming and mobile devices. He stressed that WB wants to ensure that the tradition of excellent sound capture and production is maintained going forward, and called on the industry to help make that happen. As part of this effort, WB has been working with Dolby and others to create the right tools to create a high-quality remix for binaural headphones. Fisher reported that the early results of the new binaural re-mixing technology have been very impressive.

Tim Carroll, Dolby Labs, said there has been great progress in codec quality over the last year, but end-to-end system reproduction remains a problem, as distribution pipelines and players have such a wide range. Just because something sounds good in multi-channel theater audio doesn't automatically mean that it is the right mix for a pair of mobile headphones. This tied-in well with Fisher's call for doing binaural-specific remixes of content.

Deep Sen, Qualcomm, took the audience through how scene-based audio works, and why it is particularly important for VR. In particular, VR drives both a requirement for a true 3D sound field, and to be able to rotate that sound field in response to head movements. Object-based capture is very difficult in these environments, as there can be many sound sources, and they can also be moving. Scene-based Audio is a little like light field capture for video -- it includes sound pressure measurements based on space and time. A specialized array of microphones that captures sound from every direction is used, and the result is mixed with any desired object-based audio sources. Qualcomm helps support the production of scene-based audio by providing plugins for ProTools. The output can be rendered in MPEG-H format for use with VR. To support head tracking the final result also has to be dynamically rotated in real time -- with less than 50ms latency -- to support motion and head tracking. That is in addition to the initial rendering through a standard or personalized HRTF (Head Response Transfer Function). 

Phil Hilmes, Amazon Lab 126, addressed the progress and challenges of playing content in a consumer environment. Those range from synchronization across multiple devices and rooms -- for both audio and video, as well as with sensor and camera data. While there are vendor-specific solutions like Sonos's sound synchronization, Hilmes called for an industry standards effort so that a variety of devices from different vendors can all be synchronized with the user's needs. Sensor input can also be used to help optimize and personalize the audio experience -- especially if it can be standardized and shared. 

The panel discussed that arrays of simple devices, like the Echo, that could be calibrated, synced, and used to render a sound field, might be an alternative to more-traditional fixed -- and often hard to setup correctly -- multi-channel speaker arrays. Multiple sound bars used in combination was also discussed as an option.