How Do We Perceive Moving Images?
Hot Button Discussion
by Michael Goldman
As previously discussed in Newswatch, major engineering breakthroughs in recent years have enabled manufacturers of image capture and display devices to produce stunning results in terms of imagery that incorporates better color depth, deeper dynamic range, faster frame rates, and other enhancements that the human visual system can process and comprehend. These developments sometimes get convolved with the quest to bring greater realism to moving images, when often they are actually used to enhance a creative distortion of reality. David Long, an associate professor at the Rochester Institute of Technology (RIT), director of RIT’s MAGIC Center, a media research facility, SMPTE Fellow, and contributor to the SMPTE Educational Advisory Board, suggests the difference is subtle, but crucial. Such imaging technology breakthroughs, he says, can actually be aimed at either responding to or perturbing human expectations for viewing imagery under particular conditions, not always in terms of pushing the bounds of human consumption of media by somehow bringing imagery “closer” to what we can see and experience in the real world.
Long emphasizes that the production and consumption of moving images is seldom about replicating reality exactly—an inherently difficult goal, in his view. Rather, creating and displaying moving images have always been artistic endeavors aimed at impacting our perceptions of reality.
“Improved imaging system fidelity ultimately creates a larger palette for the artist to play in,” he relates. “But all creative decisions should be made through a careful understanding of what the human visual system can fundamentally perceive."
“From the perspective of creative content production, perception is critical, because it is about the collection of sciences that address the human being as a sensory tool and as a cognitive interpreter. Therefore, visual scientists are interested in the functionality of both, and their intersection. At their core, human sensory systems are all about bringing in data. Over the years, engineers have managed to make reasonable facsimiles of human sensory systems—our ears have been copied by microphones, for example, and cameras with improved dynamic range and color reproduction better align with our visual capabilities. But with [media], we need to be able to exhibit and distribute imagery [for viewing in an environment different from how we consume scenes in the real world]. So [the creation of media content] is about creating a reproduction of something that is real unto itself for human beings to consume separately.”
Therefore, Long elaborates, “human perception” is actually about two things—detection and then a process of translating what is detected so that the human brain can process it. It’s here that the industry’s ongoing quest for dynamic range, color gamut, frame rate flexibility, and so much more comes into play—all of cinema and television viewing throughout history has been built on what Long describes as “a stroboscopic illusion.”
“[Moving images] are a successive series of still images presented to the human observer so that their brain can interpret fluid motion out of them, but in reality, there is nothing fluid in the motion in terms of physics that is being presented to them,” he says. “We don’t see the world in still frames—there is no frame rate in human vision. Further, the full color volume observers interpret in image displays is constructed by careful presentation of just three primary colors, and not the spectral richness of real-world light sources and scene objects, a concept known as metamerism.
“Yet, if you look across the last decade or so of image science research for media, SMPTE conferences and the like, you see emerging trends that are absolutely associated with an underlying respect for, and an attempt to represent perception perfectly. So things like higher dynamic range, where we try to get our imaging systems to faithfully capture and then reproduce the total luminance range that the human visual system is capable of detecting; wide color gamut, where we try to both capture and display imagery in a more faithful and full natural color volume; and the granddaddy of them all—spatial resolution, the thing that we all associate with the evolution of photorealistic quality—it’s all still a temporal metameric illusion. The very fact that our entire technology foundation for moving images is based on stroboscopic illusion, the presentation of successive still frames with only three primary colors, is already a clear distortion of reality. So researchers are attempting to understand the differences between the way the human sensory system works when interacting with reality and with these engineered systems. But then, of course, we also have filmmakers like [director] Ang Lee and others who are deliberately trying to manipulate this new fidelity to get something else creatively.”
Long elaborates that, at the end of the day, it’s all about interpretation—both scientific and artistic—in terms of what a human being can see and process.
“Vision science is about understanding the characterizable physical limitations of the sensory mechanisms of a human being—what are we capable of detecting and when do engineers need to create facsimiles or distortions of a signal,” he explains. “Those are tough topics unto themselves, but it really gets convoluted when you elevate into interpretation, and interpretation needs to be thought of in two different ways. First, there is the literal neurological and cognitive interpretation of a detected signal by a human being; and second, there is the artistic intent and perturbation. Those can be two totally different things. But [for creative professionals], I am a firm believer that if you are really good at understanding the first, then you are more useful to the industry, which is intrigued when you intentionally manipulate the second.”
He adds that vision and color scientists continue to work to understand all the reasons why people interpret imagery they have seen in the real world and “a literal, perfect facsimile” differently when seen on a screen. One particularly important factor in this perception, he says, is the difference in the environment of the person viewing the scene.
“A perfect reproduction in color, in terms of physical light level on a screen in a dark movie theater, would look different from the scene it was created from,” he explains. “The contrast renders lower and the image appears lighter. The tone scale, the shadow detail, they render differently, simply because the visual system is being subjected to the physics of a different viewing environment. In a theater, the periphery of your field of view is dark, compared to seeing the scene in the real world, where you have the ability to look around in a full normally lit volume. The very difference between those two environments causes a difference in interpretation of contrast and tonality in the human visual system.”
All of which brings the conversation back to his earlier point about “manipulating the distortion” for creative reasons. Specifically, Long talks about how filmmakers who understand vision science can then take more effective creative license to perturb reality for artistic purposes. He points to ongoing industry conversations about so-called “hyper-reality” and the work of a handful of filmmakers such as the aforementioned Ang Lee and the perturbations he cooked up for his 2016 film Billy Lynn’s Long Halftime Walk and his 2019 film, Gemini Man, both shot in stereoscopic 3D at 120 fps and 4K resolution, with Gemini Man adding a 100 percent digital human (a younger version of actor Will Smith) for good measure.
“I don’t like the term ‘hyper-reality,’ because the prefix ‘hyper’ literally means ‘greater than’ or ‘more than,’ ” Long says. “That’s inaccurate in a lot of imaging attributes—they are not ‘greater’ than reality, per se. Also, creative and intentional aesthetic distortion of reality for intentional purpose is a tool that filmmakers have used forever. It’s just that we are now able to push the boundaries of our camera systems and display systems so they can provide a perturbation unlike any we’ve seen before. That’s what might be more accurately considered the ‘hyper’ part. We can put on screen a wider range of colors, more than any previous film or electronic display has been able to in the past. We can flash more frames per second. We can get somewhat closer to the reality of the luminance range—the blackest blacks to the brightest whites.
“So what happens is, now that we can do that, audiences are experiencing filmmakers like Ang Lee operating in a much larger palette or volume of perturbation. But that is different than saying you can show things on a screen that human beings in the real world aren’t visually privy to. It’s impossible to create colors people have never seen before. Even the big advance with Rec. 2020 [BT.2020] as a standard for display color volume, though far in excess of what Rec. 709 delivered, is far less than what the human visual system is capable of interpreting. We are moving in the right direction in terms of [offering filmmakers greater creative options], but we shouldn’t use the prefix ‘hyper’ as in ‘more than real.’ It’s simply a greater gamut of potential perturbation than we could offer previously.”
These facts beg the question how hard, exactly, should equipment manufacturers push in pursuit of meaningful image quality leaps, and which improvements should they prioritize, since some will be more detectible by viewers than others. Long suggests manufacturers “need to respect detectability limits in terms of what the human visual system is capable of.”
To illustrate this concept, he points to a 2015 SMPTE paper penned by scientist Sean T. McCarthy, Ph.D., then of the ARRIS Group and now director of video strategy and standards for Dolby Laboratories. Long says that McCarthy’s paper addressed the issue of detectability limits by examining many key imaging attributes in the creative process—particularly spatial resolution, high dynamic range (HDR), wider color gamut (WCG), and higher frame rates (HFR), both individually and collectively.
McCarthy’s paper attempted to quantify the inter-dependence of these various visual attributes in order to offer insight into how and when they should be addressed as a package or as individual characteristics in the creative process.
“If you read Sean’s paper carefully, he is mainly talking about a traditional or normal viewing paradigm,” Long says. “He suggests that if you and I are standing around and taking in a scene, and we want a photographic representation of it, there is a logical limit to the number of pixels you need to represent that scene because there is a limit to the resolution you can see as a human being in a static environment. So there is a limit to the number of pixels the manufacturer needs to put into the camera, and same thing with displays. So manufacturers have to be cognizant of that. Is it worth their investment to try and continue to elevate the spec sheet? Will that effort and expense translate to something on screen that is an actual improvement or which even noticeably changes the experience in any way?”
An obvious example, Long continues, is the notion of viewing content in authentic 8K resolution.
“For an 8K display to offer a resolution that the human visual system can significantly appreciate, the screen has to occupy an immense fraction of your field of view,” he says. “Basically, you would want to have a wall-size TV in your living room to fully gain an advantage from 8K [in a home viewing situation]. And yet, we are talking about going even larger, so that’s a debate to be had. Do we need to put our investment into that versus other things that can improve the experience?
“I think this is precisely why we have seen HDR have such a good run in terms of manufacturer focus in recent years. They were leery of only chasing the pixel count. Why go to a higher resolution than a human being can appreciate, when instead, you could [focus resources] to improve the quality of the experience of your imaging system in another attribute like dynamic range?”
Long concedes that visual science researchers have been “trying to understand the human aggregate interpretation of image quality” for generations, and that “it remains a complicated problem.” However, ongoing work to build what he calls “metrics of association where you try to scale people’s appreciation of accurate color or their appreciation of frame rate in terms of a common comparator, a common language” has led them to certain general conclusions such as the notion that quantifiably higher dynamic range is detectable and, in many cases, particularly impactful, compared to the usefulness of some other imaging characteristics in the long run.
“The SMPTE Motion Imaging Journal has been a great source for recent published research on resolution quality, he says. “For example, is it all just the number of pixels, or could it also be frames per second? The argument in the research was that the more frames per second, the less motion blur there is. And so, if you can reduce motion blur, it should be as impactful, if not more impactful, than simply using more pixels to capture a blurry frame. I mean, if you go 4K, 8K—how many K’s are you going to go to? If we can do more frames per second, on the other hand, we’re going to get objectively better spatial resolution simply because we’re not allowing the camera shutter to blur for as long. And that should be considered an aggregate with the number of pixels. You tend to see a lot of researchers doing one versus the other. But then, when you try to do it multi-mode, and try to get every attribute of quality represented in your model, it gets really hard to predict exactly what observers will appreciate the most.”
Regardless of specific metrics or scientific research on such matters, however, content creation is obviously, by its nature, a creative process. As such, it logically benefits the most from flexibility, Long emphasizes. In that sense, content creators will sometimes want to evoke the notion of reality and, at other times, will want to push the envelope in a far different direction. Therefore, Long suggests, manufacturers might find the pursuit of “adaptive technology” most worthwhile, rather than focusing on one specific enhancement or another in their so-called reality.
“Take a Seattle cityscape on a foggy day—that’s not a high dynamic range scene,” Long says. “The actual reality of such a scene would be very low dynamic range, and 80-year-old cinema technology can do a perfectly good job of creating a realistic reproduction of that scene’s dynamic range. Not every artist will need high dynamic range for every scene. So to put a moniker on their film that says ‘available with enhanced dynamic range’ might actually do them a bit of a disservice because it might give the audience an expectation that the art has to conform to the technology when, in fact, the art didn’t need that particular technology.
“So that is a dichotomy. A lot of other pieces will go back-and-forth—pushing boundaries in some scenes and then going the other way in others. A great example of that might be frame rate. Does a two-hour movie have to be shot and exhibited at a single frame rate? Maybe we should consider that some scenes would benefit aesthetically with one frame rate and others with another. Maybe we need an adaptive frame-rate technology from manufacturers to allow artists to go and push on this [flexibility] a little bit more in their storytelling. After all, is that really all that different from creative scene-to-scene color correction?”
Ten-Year Tech Lesson
An end-of-the-year TV Technology column lists some crucial technology lessons that have directly impacted broadcasters during the course of the last 10 years—the so-called 2010s. The column, by Tom Butts, suggests that the biggest single change over the decade was the arrival of streaming and the corresponding OTT revolution that came with it. Other trends, by contrast, like 3D for broadcast, “came and went in a flash,” according to Butts. Another innovation that failed to live up to its promise was so-called mobile DTV, or the ATSC 2.0 initiative. Butts calls that “an important lesson for the industry” as it now transitions into the next-generation broadcasting standard ATSC 3.0, which he calls “the best hope of relevance” for the broadcast industry “in an IP-based world.”
Videogame Preservation Efforts
A recent Hollywood Reporter article examined efforts to find, document, and preserve digital assets related to videogames now that the videogame industry is over 50 years old. The article suggests that major studios that produce videogames have been inconsistent, at best, at preserving videogame data, and so, outside institutions such as the Smithsonian American Art Museum (SAAM), the National Videogame Museum of Texas, the International Center for the History of Electronic Games (ICHEG), and others are trying to take up the cause. One of the problems with this initiative, however, is the fact that so-called “raw” materials at the foundation of such games—original source code, for example—is hard to acquire and preserve because videogame producers are fearful of sharing such proprietary data due to concerns over piracy and other issues. Additionally, many creators of early game content have left the industry and either taken their source code with them or, in some cases, neglected to preserve it altogether. Thus, one of the big challenges for ramping up the videogame preservation effort lies in simply convincing game companies to participate to begin with, the article suggests. By way of example, the article points out the fact that the industry’s official trade association—the Electronic Software Association (ESA)—didn’t even want to comment about preservation-related issues for the article.
Netflix Pushing HDR Hard
A recent American Cinematographer article detailed the considerable impact that Netflix is having on the broadcast industry’s transition to higher dynamic range (HDR) original content. The article points that as more consumers purchase Smart TV’s and subscribe to Netflix and other streaming services, the industry giant has made what it calls “a concerted effort” to produce HDR content, estimating it currently has over 1,000 hours of original HDR programming available. The article explains that Netflix now gives its filmmakers a minimum requirement for the camera technology used to make original programming for the service, but stays away from dictating how filmmakers must implement HDR creatively beyond mandating an HDR finish for the increasing number of viewers watching on HDR monitors. By way of example, the article takes a detailed look inside the incorporation of HDR into the second season of the original Netflix series "Mindhunter," and how HDR was used creatively by cinematographer Erik Messerschmidt.