In the April issue of the SMPTE Motion Imaging Journal Dominic Rüfenacht and Appu Shanji from Berlin, Germany’s Mobius Labs present their work on “Customized Facial Expression Analysis in Video” using AI and machine learning. While existing computer vision-based emotion recognition systems are trained to classify images of faces into a very limited number of emotions, their takes a different approach, training a convolutional neural network (CNN) to distinguish subtle differences in facial expressions. Key to their work is adapting the model to work reliably on video content by filtering out faces unsuited for expression analysis using a novel automatic face quality assessment.
Facial expressions are one of the most important forms of non-verbal human expression and a key driver of social interaction. But as the authors note, while an emotion is an internal feeling, facial expression is external and objective, referring to the positions and motions of the muscles beneath the skin of a face. Expressions can indicate our emotion but do not always reflect our emotion, such as when we smile although we may be sad. For this reason, Rüfenacht and Shanji set out to train a system to distinguish facial expressions rather than emotions.
Their aim was to objectively distinguish expressions at a very fine granularity, by training a system to model the 32 facial muscle actions called Action Units (AU) as defined by the Facial Action Coding System, and then combine them into facial expressions. Since it takes a trained human expert one hour to score one minute of video for 32 AUs, the most popular automatic approaches still resort to classifying faces into only six or seven basic expressions. The authors address this complexity by instead using a Facial Expression Comparison framework that trains the system using triplets of faces, each annotated with which face is most dissimilar to the other two. This approach provides an intrinsic way to distinguish different combinations of AUs without the need to explicitly label the faces.
Their approach enables them to:
To demonstrate its practical usage, the authors applied their approach to searching and tagging video content for specific expressions, using a single face as the query. They were also able to generate expression statistics and summaries for various individuals appearing in a video, comparing the appearance rate of specific facial expressions of the two candidates in the 2020 U.S. presidential debates. These facial expression features can be extended to other diverse applications by adding further classification, ranking, or clustering layers. They suggest applications such as advanced visual similarity search for video that can match and rank actor expressions, creating automatic highlights of video segments by identifying expressions that particularly capture viewer interest.
Read the complete article in this month’s SMPTE Motion Imaging Journal https://ieeexplore.ieee.org/document/9749801to learn more about Rüfenacht and Shanji’s pioneering work.
#AI, #artificial intelligence, #machine learning, #convolutional network, #CNN, #facial expression analysis, #action unit, #AU, #triplet prediction