“Eye Appearance and the Uncanny Valley”
I have been fascinated with the concept of the uncanny valley since I first learned about it, and I knew I wanted to contribute to researching it further for my thesis. After reading up on the subject, I learned that most major theories propose that the uncanny valley is a disgust of human-like, but not-quite-human figures. Many familiar with the uncanny valley suggest that this response is due to not-quite-human depictions triggering a feeling of death or disease that humans are naturally inclined to avoid in order to ensure their survival.
From here, I began to read up on current research on human perception of faces to try to identify a specific pain point of artificially produced faces that was likely to bring about this unique reaction of disgust, and I explored my findings via experimentation.
The Uncanny Valley
The Uncanny Valley Theory states that as an artificial rendering of a human approaches true realism, it eventually will reach a point close to realism wherein it will be met with strongly negative emotional responses from onlookers, which many believe to stem from an instinctual human desire for survival. The concept of the Uncanny Valley was originally proposed by roboticist Masahiro Mori in 1970, and the term “uncanny valley” arose from a translation by Jasia Reichardt in 1978. As digital depictions of humans become more prominent in popular media, the term was used to describe the same negative reaction brought about by digital imagery. The graph below describes the uncanny valley effect using common imagery.
Humans, and our primate cousins, tend to focus the majority of our visual attention on the faces of peers. It has been shown that humans fixate on the eye region of the face the majority of the time (40%) when presented with a forward-facing image of another person (Janik, Wellens, Goldberg, & Dell’Osso, 1978). An eye tracking study conducted on chimpanzees showed that visual attention is focused on the eyes, nose and mouth, with the majority of attention directed toward the eyes (Figure 4; Hirata, Fuwa, Sugama, Kusunoki, & Fujita, 2010). Even if the image was rotated or turned upside down, the majority of focus remained on the eyes. This suggests that chimpanzees are able to establish the most information by looking familiars in the eyes, with side analyzation of the nose and mouth, to determine emotion, expression, and intention.
If the majority of our focus and emotional connection is fed through the eyes of another being, then investigating the effects of the eyes alone as a contributing factor of the uncanny valley effect can lead us toward a future where the effect is not triggered by animations.
Digitally produced eyes alone can trigger the uncanny valley effect.
My investigations into the uncanny valley were conducted using the methods outlined by the scientific method.
Based on previously successful studies attempting to investigate the emotional response from near-human imagery, I chose to have people rate images of varying levels of realism on a subjective rating scale. I morphed two images - one real video of a human and one unrealistic human 3D model - to varying degrees and showed these images randomly to participants. They were asked to rate each image on a 5-point scale from “unrealistic” to “human.”
Eleven images were created for the purposes of testing for each testing session. These images were made by morphing stock footage of a real person with synced animation of an uncanny digital human figure.
A user-friendly local web browser based application, created for this project by Alexandros Lotsos using p5.js, was used to display these eleven images in a random order to volunteers in a way that allowed for the images to be easily rated. The application provided users with the looped 8 second animated video with 5 buttons directly below the image: —, -, o, +, and ++. This was displayed fullscreen on a laptop with nothing else visible on the screen.
Volunteers were asked to use the laptop’s trackpad and cursor to click one of the 5 buttons on screen to rate their reaction to the image displayed. The volunteers were given a piece of paper outlining the scale that the buttons represented which was laid over the laptop keyboard in an effort to remove potential distraction and to only allow for the use of the laptop’s trackpad for selecting buttons. The five buttons corresponded to the following descriptions:
- -- - “I HATE this; This is not human.”
- - - “I do not like this.”
- o - “Neutral.”
- + - “This is fine.”
- ++ - “This is a human.”
In order to improve upon the rating application, gather more information regarding responses, and to account for gender biases to the images themselves, a second round of testing was conducted. This time, the stock footage was that of a male figure looking forward towards the camera, emotionless, and moving his eyes naturally. As with the first round, the same digital model’s eye animation was synced with that of the stock footage.
The application was updated in two ways. First, a response time for each image — the time between the image being loaded and the button being clicked — was added to the logged responses. Secondly, instead of beginning with the first of the eleven images, a “start screen” was added so that volunteers would not be exposed to the first image during the instructions.
All values were converted to number values in order to compute an average response for each image. The average rating of each image was then plotted on a graph to visualize the responses, so that they could be compared with Mori’s original attempt to graph the uncanny valley.
It is important to note that the response to each image appeared to be influenced by the image previous, in that if a person was given a highly uncanny image followed by a less uncanny image, they were more likely to rate the second image more favorably than if it followed an image of similar digital opacity, and vice versa. The responses to the first few images presented were, on average, rated more neutrally (-, o, + as opposed to the extremes of — or ++). Since all the data were averaged in the end for graphing, this will have helped to reduce the comparison bias within individual tests.
Data analysis methods for experiment 2 matched closely with those for experiment 1. However, a separate graph of response times was also produced for experiment 2 since those data were now available due to the updated data collection methods.
It should be noted that within individual data sheets, times appeared to be higher and ratings appeared to be more towards neutral on the first few images that were presented to each responder. As images were displayed randomly, this should have been accounted for over multiple tests that were averaged together.
Introducing digital eyes onto a fully realistic image was able to trigger the uncanny valley effect in both rounds of testing, and there was an obvious turning point of perceived uncanniness when graphed, with the drop beginning to occur at around 20% digital opacity on each set of images. Once the images started to hit 70% opacity on both sets of images, the majority of responses were overwhelmingly negative. The set of data collected in this experiment was enough to conclude that yes, unnatural eyes alone morphed into an image of a real person was enough to trigger a negative emotional response from onlookers.
The quick responses people had to the images indicated that the first place someone looked at the image - statistically, the eyes - was enough to determine their overall response to that image without having to completely analyze the face.