See How it Works
Learn why technologists and engineers join
Learn why technologists and engineers join
View the companies that support us
Access information about your local section
Access information about your student chapter
Learn more about SMPTE standards
SMPTE engineering documents and others
Get involved in creating next generation standards
Online and in-person meeting schedule
Operations manual and guidelines
As discussed in Newswatch in 2019, a key reason for the rising promise of machine learning, or artificial intelligence (AI), in many major industries, including entertainment and multimedia applications, is the rapid, ongoing evolution of so-called artificial neural networks. Such networks are essentially highly sophisticated software algorithms that mathematically learn specific models in order to optimize output values in a particularly efficient and satisfying manner.
One of the traditional challenges in terms of utilizing neural networks, however, has been the fact that such models can be difficult to move from one location to another over online networks because they are massively data heavy. Thus, according to Jean-Louis Diascorn, senior product manager at Harmonic Inc., a major video delivery solutions company, the industry has started working toward a standard for compressing neural network data structures into bitstreams that can efficiently be transported for multimedia applications.
“Let’s say you have a machine-learning algorithm based on a neural network, and you train it—that is a very complex piece of software,” Diascorn relates. “You train it and you find your model, and then, you will want to apply the model to your line application. That might mean you have to transfer the model to remote locations so that it can be useful.
“For many neural networks, that is a huge amount of information. So there is now a group within MPEG [Moving Picture Experts Group] that is trying to find a way to compress that data so that you can [more easily] move the neural network model to another location by transferring the neural network information at a lower bit rate.”
Diascorn fully expects this work will eventually lead to a new compression standard, though he can’t say how soon that might happen. In meantime, however, he points out that AI methods are already starting to influence the video compression universe, particularly because video compression is an application whose value depends on the very things that machine learning was designed to positively impact—speed, cost, functionality, performance, etc.
Specifically, he points to three categories where machine learning can be helpful with compression tasks. The first category involves the ongoing challenge of providing bitrate savings without significant loss in video quality (VQ). Diascorn suggests that a well-developed AI codec could be capable of running in realtime to process live video, thus offering better VQ, while also providing significant bitrate savings. This would likely include full implementation of a neural network, which wouldn’t be easy in the sense that it would require more CPU power, but he suggests this advantage is in the offing nonetheless.
“Bitrate savings versus VQ is a typical compression improvement we look for,” he says. “Generally, machine learning can improve the algorithms to become a faster solution than having engineers going through content and spending a lot of time fine-tuning. First, we train the machine—a learning step that produces a model. And then, when we are happy with the model, it gets downloaded to a video compression encoder as a software upgrade, and then it can do a special function for us. Here, basically, machines are trained to modify certain parameters in the encoding core. Depending on the source, it will modify the encoding core settings or configuration to do better encoding. In other words, with the machine learning approach, you just feed a lot of content and the machine will find the best configuration possible.”
The second category revolves around the notion of varying frame rates during the compression process, according to Diascorn. He says AI approaches can make possible dynamic frame-rate encoding (DFE) advancements.
“In this application, instead of encoding always at 60 fps, depending on the source content, maybe you will encode at a lower frame rate, like 15 fps or 30 fps,” he says. “Here again, it will be the machine that will decide which frame rate is best, based on the learning you gave it about the potential visibility of the frame rate. With fast-moving sports, if you lower frame rates, it can be visible. But if you lower them from fixed sequences, frequently no one will see it. So the machine will be trained to know that and act accordingly.
“And if you do that, the main savings is on processing power—the CPU. When you encode at a lower frame rate, you use less CPU power. So the idea is to reduce the power and improve the density. With just one rack of equipment, you could do more than you could before. Plus, you get a slight improvement, saving on bit rates, since you are encoding a little bit less.”
The third application involves dynamic resolution encoding (DRE), Diascorn says.
“This is where the machine will select the best resolution, depending on the source and some visibility,” he explains. “Again using the example of a sports application, which has lots of details and fast movements—if you compress at a low bitrate, it is likely those details, in terms of fast movement, will create macroblock effects or pixelization within the image. But, in fact, the better quality picture means details that fast won’t be seen by the human eye, so it might be better to encode those fast movements at a lower resolution. And on the other hand, if you have slow movement, it is better to encode it at higher resolution to get a better-looking picture. So we need to train the machine with lots of data, lots of content to produce a model that will make it decide what resolution to encode at. But if you do it right, you can gain in terms of the quality of the experience, with improved fast movement, fewer artifacts, and improved slower movement with higher resolution. And you can also have a gain in CPU power, since you are not always encoding the entire video at the highest resolution—so you win back some CPU there.”
Diascorn offered two industry presentations in 2019 built around a similar thesis providing further details and context for how AI technology is “aggressively,” in his words, impacting video compression applications—one for NAB 2019 and one for the SMPTE 2019 Technical Conference. However, he is quick to point out that compression is but one promising application for machine learning within the media world, as the concept pushes past its early media uses for things like speech-to-text translation for close captioning, language translations, metadata organization, key frame detection, etc.
“On the display side, for example, machine learning can help to do super resolution,” he says. “When you get a video or picture at a certain resolution and want that resolution to increase, machine learning has the potential to help you do up-sampling.
“In production, there are now techniques that use machine learning for things like chroma-keying. They can use machine learning without worrying about a color background—you could use any background, and the algorithm will find the objects you want to maintain and replace the rest as you want. I saw a nice demonstration of that application at the NAB 2019 show.
“I’ve also seen demonstrations of automated cameras taught to track specific characters in scenes automatically, and there are many other possibilities. Some of these things are still a bit futuristic, but we are already seeing examples on the production side, so I think a lot of exciting things are coming.”
From a content development, creation, and distribution perspective, however, such potential also leads to another question given that media is inherently a creative profession. That question is, how far can or should content creators go down the rabbit hole in terms of essentially training machines to make creative decisions automatically? After all, the concept of making certain processes happen faster or more efficiently is highly appealing to creatives, but not if it means reducing the human element by somehow going too far.
Diascorn says it might be theoretically possible, for instance, to someday train an editing system to essentially emulate one of history’s great editors in terms of creative choices. However, as discussed in the January 2019 issue of Newswatch, early attempts at training machines to write creative material, such as screenplays, have a long way to go, to say the least, when one considers that being creative involves a lot more than simply following a pattern and religiously using particular techniques. He doubts the paradigm will shift in that direction dramatically anytime soon.
However, he points to recent visual effects breakthroughs that were aided and abetted by artificial intelligence tools and methods recently. These include the groundbreaking “de-aging” work done on Martin Scorsese’s 2019 film The Irishman, and the controversial announcement not long ago that deceased legendary actor James Dean would be “resurrected” as a virtual character in an upcoming movie thanks, once again, to an assist from AI tools. Diascorn’s point is that the industry is certainly going to push limits to see where and how machine learning can help creatives do things more efficiently in some cases, and break down barriers in other cases. In the end, he expects it will all be “something of a tradeoff.”
“Even getting back to compression, for instance—you have to be aware of the situation, where it is needed and where it isn’t,” he says. “Right now, it can be costly to deliver full resolution content to all users, so you learn to compress more or less depending on where it is going. You aren’t going to see all the nice details and great picture that a content creator intended if you want to watch a movie on your smartphone, as compared to an IMAX screen. In both those cases, there would be compression, but on the smartphone it might be down to just a few hundred kilobits per second, and in the cinema, very likely several tens of megabits per second, depending on various factors.
“So either way, you are going to remove some data, for sure. The work of video compression is to remove what is redundant traditionally, but nowadays, to also remove things the human eye can’t see. The AI needs to be as transparent as possible—sometimes, you won’t even need it. And when you talk about the creative aspect—sometimes, there may be artifacts at the source that are intended by the filmmaker. So you want the encoder to reproduce those artifacts. Sometimes, a compressed picture with a great encoding solution will look worse than a result produced with a bad encoding solution, because the original artifacts remained as the [filmmaker wanted]. That was his creative intention. When you talk about machine learning, the important thing is the learning—and that will always require human participation.”
As the COVID-19 emergency took hold of the country in recent weeks, the importance of the broadcasting industry as a tool for information and education rose to the forefront, as a couple recent articles in TV Technology make clear. The first article discussed the key role currently taken on by local TV stations to supply remote-learning partnerships with state and local education authorities across the country. As of press time, stations in 25 states were offering remote-learning programming in concert with their state’s educational agenda. America’s Public Television Stations (APTS) offers a rundown of what stations and what kind of remote learning is available, according to the article, with some stations committing entire daytime schedules to age-appropriate educational programming. The second article summarized a recent study from TVB, a non-profit local broadcast television industry trade association, which examined changes in TV viewing habits since the national crisis began. Among other things, that study indicates that young people, particularly Millennials, ages 18-34, have radically increased their viewership of local television news in recent weeks, a departure from a larger general trend, which indicates that Millennials are the biggest group of viewers to typically abandon traditional broadcasting options for streaming alternatives.
In light of the unavoidable cancellation of this year’s NAB show in Las Vegas due to COVID-19 concerns, the National Association of Broadcasters announced recently that in addition to planning for a revitalized NAB show in 2021, the organization is now launching a digital “experience” called NAB Show Express designed to “provide a conduit for our exhibitors to share product information, announcements and demos, as well as deliver educational content …” A ProVideo Coalition article summarizes the implications of the announcement, and suggests NAB Show Express could become a template for new methods of allowing the broadcast community to “interact virtually” going forward, in addition to potential live events. NAB Show Express, at press time, was tentatively slated to launch in April 2020.