Hot Button Discussion
Human Image Perception
By Michael Goldman
As engineers push ongoing image display technology enhancements, and content creators experiment with how they can utilize those enhancements to tell stories, a few questions naturally come up. How do either engineers or content creators know what average consumers are capable of seeing when they watch images in state-of-the-art display situations anyway? And what are the goals of enhancing the viewing experience? Are they merely to evolve, change, improve, or differentiate that experience from what it was previously? Or are the goals about trying to bring it closer to what human beings experience when they view the real world?
Jenny Read, Professor in Vision Science at Newcastle University’s Institute of Neuroscience in England, emphasizes that she is not a content creator or a hardware engineer. She’s a vision scientist, meaning she studies visual perception in humans and other species. She suggests that how people perceive imagery has always been “an active process.”
“If you were sitting around a campfire in the Neolithic Period listening to somebody in your tribe tell stories, you were relying on someone’s brain and visual cortex to create their imagination of that scene,” she says. “And then, eventually, it got more sophisticated, and we got paintings and photography and so on. But, still, in the end, we are relying on people’s brains to fill in the gaps in terms of what we see and can understand [about an image]. With that said, until now, we have always been able to discriminate between reality and whatever reproduction of reality we were seeing. On one level, we can tell the difference, and so, the brain is not fooled. But on another level, the brain fills in the gaps, and so, we allow ourselves to have a rich and enjoyable experience [when viewing content that we know is not a reproduction of reality].”
The modern landscape now provides a rich smorgasbord of HD, UHD, 2K, 4K, 3D, HDR, HFR, high color gamut imagery, and much more in various combinations depending on a range of conditions and what hardware and environments are at play. Read suggests that, at the end of the day, there is no such thing as replicating reality on a viewing display or even on a cinema screen in terms of truly convincing the human vision system. However, she adds that is, in a sense, irrelevant because modern display technologies are well on their way toward achieving a far more important goal. That goal involves helping content creators make sure that what the viewer is seeing makes perfect sense to them visually and intellectually as the images play out.
For example, Read suggests that it is scientifically true that there are, in fact, colors that human beings can perceive that cannot be presented on modern displays, if one thinks about color gamut on modern displays. That begs the question, how much effort and resources should manufacturers put into rectifying that, when they are already making spectacular breakthroughs on what, from a perception point of view, is a more important, and directly related, issue anyway—dynamic range?
“You might need more than three primary [colors] to actually cover the space [between what the human eye can perceive and what a particular display device is capable of showing],” she says. “But it might be diminishing returns. Whereas, dynamic range is important because if you know how to make things extremely bright, that helps you dial up all the primaries anyway. If each of our primaries has much bigger dynamic range, then that helps your display show a richer range of colors effectively.”
However, she adds, “these kinds of issues relate to the activities of the [color-sensitive] cones in your eyes,” making them merely the “starting point” for our vision systems. “We don’t perceive cone activations, but rather, [we perceive] our brain’s estimate of the material properties of objects in the scene,” she elaborates. “This is why a banana appears the same shade of yellow in the supermarket under fluorescent illumination, at home under a halogen bulb, or outside in reddish evening light. What you perceive is not so much about light coming off particular pixels, but rather, how it all makes sense in the entire scene you are looking at. That is the biggest part of perception.”
As noted last month in Newswatch, due to the industry’s ongoing dynamic range revolution, new broadcast displays are already approaching the limits of what the human visual system can appreciate regarding dynamic range. For the most part, Read suggests, while human vision functions over “an incredible range” of light levels, this is possible only with adaptation, which, she says, “occurs over a time-scale of minutes.” Therefore, modern technology has broken through to a luminance range where darker blacks from the DCI’s 12-bits color scheme for cinema applications, and the new ITU dynamic range specification (ITU BT.2100) for broadcast applications, are already sufficient to cover the dynamic range of the human eye in any given adaptation state.
As she explained during a presentation on the human visual system at the NAB 2014 Technology Summit, Read says the same is true in certain other categories, such as spatial resolution. There, she says, modern display technology’s achievement of 4K resolution permits showing off the kind of contrast that can allow the human eye to typically perceive the finest detail possible in any flexible display environment.
“I have talked about how 4K is definitely enough,” she says. “There is no real point in going to 8K unless you are expecting the viewer to sit very close to the screen because our eyes just can’t resolve more pixels than that. Those are the sorts of specific questions we can answer concerning human perception.
“But for the most part, it is not that simple, because viewing imagery on a screen is simply not like the real world. Of course, there are more aesthetic questions about what a film ‘should’ look like, what the content creator’s ‘intent’ was, and so on. A lot of it is based on what you grew up with, what you are used to culturally. In other words, although we don’t exactly know the extent, we do know that human beings learn to see pictures—pictures that are quite unrealistic regarding what things look like in the real world. The images they create in the cinema are nothing like the real world.”
An example of what Read means is when she suggests discussing the viewing of still photographs “from a geometrical point of view.”
“If you take a photo and view it, it is only ‘correct’ if you view it head-on from precisely the distance corresponding to where the camera was when it took the picture,” she explains. “But most of the time, when we look at a photograph, we are looking at it from the wrong viewpoint—a different angle. The image is distorted on our retina, in that sense, but we never notice that. That seems to be because our brain is somehow reconstructing what you would see if you were in the right position. The only time you would notice it in real life would be if you saw a photograph that, within it, depicts another picture on a wall or something, viewed from an oblique angle. Then, that image will look distorted. But if you were in the room with that picture, it would not look distorted, because your eye would correct for the oblique angle. That is an example of how the visual system adjusts for such things.”
Read says the brain’s ability to “correct” for us is central to our overall visual system. However, as new image capture and display technologies have improved, and filmmakers have started aggressively using them in the pursuit of what some people are calling “hyper-real” imagery, some viewers have found such imagery disturbing at some levels. This was noted in recent years in the mixed reactions among some critics on the impact of the technical innovations witnessed in Peter Jackson’s The Hobbit, shot at 48 fps in search of a more realistic aesthetic. More recently, it was used in Ang Lee’s Billy Lynn’s Long Halftime Walk, shot using Sony’s F65 camera system and projected in stereoscopic 3D at 120 fps and 4K resolution. Read postulates that perhaps the brain’s correction function, in some respects, can overload when imagery becomes overly hyper-real.
“Some of this relates to the idea of cue conflict,” she explains. “Cue conflict is when you have one type of sensory information telling [your visual system] one thing and another type telling it something else. In a normal cinema experience, if you have such cues telling your brain you are sitting in a chair looking at a flat cinema screen, and another set telling you that you are on a runaway train, your brain has to resolve that conflict. If it has trouble resolving the conflict, that can be an upsetting thing for the brain. That is when people can feel sick watching images.
“In the context of cinema, our brains have learned that this sort of cue conflict isn’t a problem, leaving us free to enjoy the story without worrying about why the different signals don’t match up. But if the imagery is too hyper-real, or even if it’s in stereoscopic 3D, it may be that our brains start to treat the visual input as if it is real, and that can mean that we can start to be bothered by the mismatch in the remaining cues.”
All of which begs the question about whether actual “realism” is, or should be, the way to use new image display technologies and techniques. Read compares this discomfort and the conundrum it creates to the “uncanny valley” syndrome in robotics—a scenario by which uncannily realistic robots from a physical point of view have been discovered to make people uncomfortable. This is opposed to the pleasant feelings researchers have documented when humans interact with simpler robots that only roughly resemble humans.
In particular, Read suggests higher frame rates can potentially unsettle viewers. “Given what we know, the limits of human vision is around 60Hz [60 fps] before we achieve flicker fusion [the ability to see flickering frames as appearing to hold steady without any flicker],” she says. “24Hz was a pretty low rate to update the image, leading to all sorts of artifacts [e.g. aliasing] we can see, so of course, [the industry] has been looking at higher frame rates. There is a latitude of scope to notice a difference in the temporal domain, and that is an area where displays have fallen short. But the problem is, to remove all artifacts from even slow-moving objects, you might need to go as high as 120Hz.
“That’s based on something called the Nyquist Limit, which says that the highest frequencies you can represent are half of what [a display’s sampling rate is] if you want to avoid artifacts. So if you want to get to 60Hz, you would need a screen that can do 120, but you would only need that for the most rapid motion, the highest speeds, and the highest spatial frequencies. For example, if the camera tracks an object moving real fast across a background, and both are in sharp focus, at low frame rate you will see motion artifacts, such as judder in the background. To avoid this, you can use a shallow depth of field to blur out the background, removing the high spatial frequencies, and thus making the objects less visible. At 48Hz, you could potentially keep faster movement in sharper focus without experiencing artifacts that would be visible at 24Hz. Some movies have been made at higher frame rates, and some critics have said [the costumes or sets looked too obviously fake] because things were too clear.”
Ironically, technology’s ability to improve one type of image quality challenge related to providing viewers with a sharper, more realistic image may, to some extent, have resulted in unintended consequences for the human vision system’s ability to perceive the pictures as more realistic.
This point may also relate to why some people simply do not like watching even the highest quality 3D content. Indeed, despite ongoing improvements in stereoscopic cinematic capture tools and viewing presentations, and setting aside the fact that some people simply don’t like wearing 3D glasses and others having vision depth perception problems, Read says there will likely always be a portion of the movie-going population that will not enjoy 3D for perception reasons.
“One issue with [how some viewers perceive 3D] is the vergence-accommodation conflict” Read explains. She refers to the phenomenon that some people might commonly call “bad 3D,” meaning “stereoscopic imagery that puts content too far off the screen plane, causing the viewer visual fatigue since they have to converge their eyes at varying distances while still keeping their focus on the screen plane. Basically, [content creators] try to keep the content very close to the screen plane to lessen this problem, but it still doesn’t fully solve it,” she adds.
“Another thing may be similar to the point I made about geometry being off in photographs. It may be that 3D somehow disrupts the correction process, so that when you view 3D from the wrong position, it is particularly disruptive to the brain, compared to when you view 2D content from the wrong position. We did an experiment on that, basically asking when a cube looked distorted, was it worse in 3D? And it was worse in 3D, but only a little bit, so we didn’t really come to a hard conclusion. The answer may be a mixture of all these things, plus cultural issues, I suppose. You would expect the comfort level to be different in people who grew up with 3D versus people who grew up with TV and cinema in 2D their whole life.”
All that said, Read thinks that stereoscopic imagery may well be on the verge of experiencing “a bit of a renaissance in the world of virtual reality,” where, in a sense, the medium might be a better fit regarding how viewers might more organically perceive it.
“One advantage of virtual reality for 3D is that you are already depicting views separately to the two eyes, and a lot of the content is already computer generated, so it is no big deal to make it 3D,” she suggests. “And you don’t have to wear anything extra that you weren’t already wearing in terms of the VR headset. And, most important, you are the only one in the audience, so the geometry can be set to be right for you specifically.”
Read says visual scientists are eagerly studying the perceptual impacts of virtual reality on users these days. One of the early conclusions she is leaning toward from a perception point of view is that an entirely immersive environment can compensate, as far as the visual system is concerned, for various other areas where image quality might be lacking.
“We haven’t done a lot of development research yet, but at the moment, it is obvious that, with virtual reality applications, the general quality of the images is somewhat low [because Read says, the displays tend to be relatively low resolution, sensitive to the position of the optics, and also due to the limitations of realtime rendering],” she says. “Most current VR displays don’t yet look nearly as good as a standard computer monitor. Even impressive systems will show some artifacts like color fringing or flicker.
“But on the other hand, there is something very immersive and believable about not being able to see anything else. It’s an extension of why cinema is more immersive than television. To a great extent, in a darkened theater, the imagery fills your visual environment, and everything else is black. So, your visual system better understands why you are there and what you are supposed to view. When you put on a VR headset, that sensation is amplified even more, plus you interact with that environment. In a traditional film situation, you rely on the director to show you where to look. In VR, the level of personal control plays a huge role in convincing you that you really are somewhere else. That is why VR can be so immersive even when the visual rendition of the images is pretty limited.”
Read suggests that these kinds of issues exemplify why display technology may never truly be able to accurately replicate reality as far as convincing the human visual system is concerned.
“How much of your visual field is filled by the viewing environment? And accommodation—that is another way in which current displays fail to replicate reality,” she says. “You always have to focus on the screen, whereas in the real world, you focus on near objects versus far objects. You can bring objects in and out of focus by where you choose to focus your eyes. In a film, the director makes that choice for you based on where they choose to focus the camera. Your brain has to make accommodations and respond to often competing cues, as a result.”
Read, however, suggests we never say never where the issue of accommodation is concerned. Through various ongoing projects, she says industry experts are currently experimenting with next-generation Light-Field displays capable of incorporating accommodation cues, including through the use of head-mounted displays designed to address the unique requirements of the human eye in terms of depth of field, near-eye viewing, convergence conflict, and other issues.
SMPTE IP Standard for 2017
A recent TV Technology article offers an update on the industry video over IP consortium known as AIMS, and its efforts to help enable the industry’s transition to IP video now that 2017 is under way. Among other things, the piece credits SMPTE 2022-6, the current standard that describes how to transport high bit-rate media signals over IP networks for assisting the transition thus far. It discusses the industry’s hopes for the next phase of SMPTE’s work in this area—SMPTE 2110, the next standard in this category, which has been in development for the past year or so, designed to specify a separate set of standards for IP media networking with separate essence flows and expected to begin the rollout process later this year. The article suggests the industry’s transition to an IP infrastructure could well be a swifter and more seamless one than the SD to HD transition, but even with that said, industry sources quoted in the piece still expect it to take a decade to completely play out. Meanwhile, Phil Kurz of TV NewsCheck offered his update with more details on SMPTE 2110 in a recent blog post.
FCC Changes Coming
Consequences of the 2016 election promise to impact the broadcast and new media industries in 2017 and beyond. A recent article in Wired reports on President Donald Trump’s decision to name Ajit Pai, the FCC’s senior Republican member, as the new head of that regulatory agency, taking over for Democrat Tom Wheeler. In particular, Wired and other outlets are reporting on Pai’s strong opposition to the concept of net neutrality and the industry’s expectation that he will attempt to reverse the net neutrality regulations passed in 2015. He will also try to change other reforms and regulations revolving around broadband privacy protections, cable box reforms, and many other reforms instituted or strengthened during the Obama Administration. In other areas, however, the article suggests that Pai’s agenda is clear. While Trump himself spoke out against mega-mergers such as AT&T’s proposed acquisition of Time-Warner, for instance, Pai has a record of being friendly to such deals, as long as they pose few restrictions on the companies involved.
60 Revolutionary Years Since Sputnik
While we tend to think that modern technology proliferates at an unprecedentedly rapid rate a recent article on the ProVideo Coalition site marks the 60th anniversary of the launch of the first artificial object into Earth orbit—Russia’s Sputnik 1 satellite—in 2017. The article notes that it only took seven years from the time Sputnik 1 went into orbit in October of 1957 until the first geostationary satellite, America’s Syncom 3, went into orbit and was promptly utilized to help broadcast the 1964 Tokyo Olympics live around the globe. The sole purpose of Sputnik 1, the article reminds, was to emit a steady beep tone every three-tenths of a second, and less than a decade later, live satellite broadcast transmissions were under way, and global communications were forever changed. The article, penned by Richard Wirth of the USC School of Cinematic Arts, summarizes the early history of satellite technology, including the Telstar 1 breakthrough, and feature some fascinating news clips and other vintage broadcast videos.