Current Issue - October 2018

SMPTE Newswatch Masthead

 

Hot Button Discussion

Tailoring AI to Make and Manage Media
By Michael Goldman  

Recent computer science advancements are starting to allow the practical use of so-called Artificial Intelligence (AI) computing systems to improve or impact workflows or methodologies in many industries. Perhaps none of those industries have greater potential to take advantage of what AI offers more than the media business. However, it’s so early in this paradigm shift that some experts emphasize we don’t yet even have total agreement on how best to define AI, let alone how best to use it when it comes to creating, managing, distributing, promoting, monetizing, and archiving content.
 
The generic view of AI, of course, revolves around the notion of utilizing computer systems to automate tasks or perform them more efficiently than would be possible relying exclusively on humans to make all decisions. But Ben Davenport, portfolio manager for Arvato Systems, a global IT provider and supplier of media management and broadcast management solutions, points out that how we define the term “artificial intelligence” for media applications directly impacts what media companies can reasonably expect to get out of AI systems.

“Are we ultimately talking about smart computing that uses algorithms to sort through big data? Or are we talking about processes based on a certain number of input parameters to make creative decisions?” he asks. “I think, arguably, in its most pure form, artificial intelligence is the latter, but we frequently use the term to refer to the former—processes such as speech-to-text, facial recognition, object recognition, and audio recognition. Secondarily, we talk about using that same kind of intelligence to observe patterns in data and create links between certain kinds of data. An algorithm can examine a large quantity of data on a particular topic, and start to understand the context. It’s quite interesting, but still early in terms of how we will use it.”
 
Of course, there are discussions across the industry about whether a liberal use of automation via smart computing solutions will potentially degrade human creativity in making and viewing content. Davenport suggests we live in a world where there is now “such an explosion of video content” on an almost limitless number of platforms that, to a degree, the increased utilization of AI tools as they become available will be unavoidable.

“We have to acknowledge this explosion of video,” Davenport says. “We talk about it a lot, but the real-world impact is that we need to generate more video than we have time for, constantly. That means we have to automate any task that can be automated, while still maintaining some control. There is a Japanese term I love, which originates in the Toyota automotive production system, called ‘Jidoka.’ That translates as ‘automation with a human touch.’ I think that term applies to AI in our industry. AI is going to change or have an impact on every aspect of our business moving forward, but it will focus on tasks that were not financially feasible when humans were doing it.”

What media applications will benefit most from AI, he adds, isn’t generally hard to figure out—potentially, all of them, Davenport suggests.

“It covers the breadth of all the workflows, certainly in the ‘storytelling’ aspects from generating programs and news,” he explains. “But on the flip side, part of our portfolio, for instance, deals with promo scheduling and advertising placement. By generating data from set-top boxes and gathering other viewer statistics, we can run algorithms to optimize placement of promos to get maximum benefit as far as viewers. Or equally, when you place an advertisement, to ensure that it placed in the right spot to get the best value for the advertiser. With humans, such activities take weeks to plan in a schedule. AI can do that same work within hours.”
 
Davenport emphasizes there is not currently an abundance of real-world examples of how AI can impact the media industry, because much of it is embryonic or still in proof-of-concept stages. But as a general proposition, he points to media coverage of the Royal Wedding in Great Britain this past May. There, “machines were performing real-time analysis of video to recognize guests at the ceremony.” This analysis identified guests and provided the data instantly to producers and presenters in live coverage; it was also available as searchable tags for those reporting or creating highlight packages after the event.

On the other hand, he continues, “If you took that same machine, programmed the same way, and put it at the Country Music Awards or some other event, it wouldn’t necessarily recognize any faces, because there is little overlap in personalities between those two events. This is where constant human control comes in. As with humans, machines require continual training.”
 
And that, Davenport suggests, is one of the most exciting areas of the AI paradigm today—machine learning.

He points to an article published in the German magazine FKT this past summer, penned by his colleague, Yvonne Thomas, Product Manager at Arvato Systems, which explains the various “learning styles,” as Davenport calls them, being employed today “to continuously train machines.”
 
The article explains that there are different categories of machine learning—Supervised Learning, Unsupervised Learning, and Reinforcement Learning. As Thomas states in the article, Supervised Learning happens when “the system uses its ability to recognize characteristics and thus classify data,” typically building models using what she calls “example data.”

Unsupervised Learning, by contrast, happens when the system discovers previously unknown relationships between pieces of data, finds recurring patterns and creates a structure for the data—known as “clusters.” This structure constantly changes with the addition of new data. Reinforcement Learning, on the other hand, is “a highly complex process by which Artificial Intelligence performs defined actions in a particular environment as soon as an exactly defined state occurs,” Thomas states in the article. Typically, the environment will “react” either with a positive evaluation or a negative action, and the computer then remembers the evaluation and which action is correct and which one is incorrect in similar situations.

These processes, and other machine learning techniques, are becoming increasingly sophisticated, but at the same time, this reality requires a cultural shift in how media professionals view the results, Davenport suggests. By this, he means, as noted earlier, that human interaction will remain paramount—“today’s subject matter expert will [in the AI realm] remain a subject matter expert, but one who makes sure the machines can become subject matter experts themselves,” he explains. At the same time, the humans involved will have to accept the notion that results will never be 100 percent accurate.
 
“What was always true in the media world is that everything in broadcasting has to be 100 percent accurate,” he explains. “There has always been very little tolerance for error. For instance, a basic video signal has always differed from a typical data signal in that data could cope with loss, whereas a video signal was designed to make sure that loss didn’t happen in the first place.

In the IT world, 90 percent or even 80 percent accuracy is acceptable, as the thresholds set are deemed OK, according to Ben Davenport. With AI systems, even the most accurate, best-trained speech-to-text algorithms, for example, are only roughly 87 percent accurate, he adds. Davenport states that this change indicates a significant shift. “I think, over time, that is something we have to accept in our industry—a redefining of the accuracy level for our data.”

Davenport adds that the idea of accurately and efficiently managing metadata is central to achieving such acceptance, as long as it is clear to users what part of the content is generated by humans and what part by machines, with some indication of the overall level of accuracy.
 
“As we build AI into media workflows, it will be important to understand that this is a change and that any AI system, no matter how sophisticated, will not be 100 percent accurate,” he emphasizes.
 
Davenport also points out that much of the AI paradigm will seem pedestrian to casual observers because many essential components of AI systems will be tools developed by the industry for broader use. In particular, he points to metadata management systems and hardware interfaces as examples.

“Where and how AI plugs into your [pipeline] will depend on the work you are doing and the kind of AI,” Davenport relates. “For example, with automatic promo generation—that only works if you somehow plug into a production asset management system that does a first-cut edit of the promo, and then sends it to be finessed; or at least generates a project someone can correct. For processes like speech-to-text or object or subject matter recognition, you have to have a metadata management system to sort through the data and make it useful. That is the key. We have seen tremendous advancements in the last couple years concerning the accuracy of data, the speed at which it can work, and our practical ability to implement it, but none of that matters if the data is not usable. That’s why we are now seeing more usable interfaces where we can send a video and get back a whole bunch of data that we can then catalog and make searchable.

“I think the area of metadata exchange will be the big area in this space that will require some examination for a certain level of standardization,” he adds. “One of the technologies we are working on at the moment would normalize data coming from several different places—a kind of creative metadata set from several different AI applications. There will have to be some interaction as far as how we carry some of the data. I think SMPTE  standards have largely paved the way for this in being quite flexible as far as how auxiliary data is carried alongside video and audio. But I still think the metadata exchange element will be the most important area for standards development in the AI world. Whether that is a de facto standard that gets turned into a specification or something that gets considered from the ground up, I’m not sure.”

All of this, of course, brings us back to the earlier question: if AI is or soon will be essential on the media landscape, how will human creativity remain central to the equation? How will jobs be impacted? Davenport argues the answer lies in “changing skill sets,” and suggests such transitions are nothing new for the media industry.
 
“I think there will be some significant changes in roles in the industry as a result of AI, but we have seen that kind of shift before,” he says. “If you recall, when we started to computerize broadcast systems in the 80s and 90s and started bringing automation into play, with tape robots and all the rest, every one of those steps over the last 100 years we have been making television has changed the way that we work. So this may be a big shift, but I don’t think it is necessarily bigger than other things we have done before. For creatives, this will be more about learning new ways to be creative.”

An analogy Davenport uses to illustrate this point references the electronic music revolution of the 1960s, ‘70s, and 80s—a change that relied on machines, but pointless without human creativity at the foundation.
 
“When electronic instruments started to replace acoustic ones in popular music—particularly when synthesizers started coming in, and the way musicians could then program synthesizers—that was a major change,” he says. “When MIDI [the Musical Instrument Digital Interface protocol] came along in the mid-80’s, and you suddenly had computers running musical instruments—did that take the creativity out of the music? Or did it force us to be creative, or enable us to be creative in different ways? So, in the same way, does AI or automation take away the input of our creativity, or does it just allow us to be creative in different ways?”

 

News Briefs
New Immersive Standards

In late September, SMPTE announced the publication of new ST 2098 standards for immersive audio. The new standards are 2098-1 on immersive audio metadata; 2098-2, an immersive audio bitstream specification; and 2098-5, regarding new D-Cinema immersive audio channels and soundfield groups. These are significant developments, according to Brian Vessa, founding chair of SMPTE’s Technology Committee on Cinema Sound Systems (TC-25CSS). “By supporting delivery of a standardized immersive audio bitstream within a single interoperable digital cinema package, the new SMPTE immersive audio standards simplify distribution while ensuring that cinemas can confidently play out immersive audio on their choice of compliant immersive sound systems,” Vessa stated in SMPTE’s announcement about the new standards.

Expanding Annual SMPTE Technical Conference & Exhibition
TV Technology recently published a preview of the upcoming SMPTE 2018 Annual Technical Conference & Exhibition, pointing out that the conference has grown enough to move to a larger venue at the Westin Bonaventure Hotel & Suites in downtown Los Angeles. Part of the reason for the growth of the 2018 event, to be held October 22-25, is the addition of a new, full conference track, meaning there will now be three conference tracks over three days, with 78 technical papers presented. Among the pressing industry issues that will be covered during the event are AI, IP, blockchain technology, HDR, immersive technologies, and much more. You can register for the event here.

Quantum Computing Initiative
A recent report in the MIT Technology Review says the United States government recently took a big step toward enabling what the article calls “a viable quantum computing industry” in the US. The article explains that Congress recently passed a piece of legislation called the National Quantum Initiative Act—a bill designed to establish a federal program for accelerating research and training in the field of quantum computing with an initial release of $1.275 billion in funding to establish several training centers. The goal of the initiative, according to the article, is to eventually create “a new generation of engineers schooled in the quirks of quantum physics, as well as the principles of computer engineering.”