ExpLLM: Towards Chain of Thought for Facial Expression Recognition

School of Engineering, University of Chinese Academy of Sciences
Department of Computer Science and Technology, Tsinghua University
Pengcheng Laboratory
School of Computing, National University of Singapore

The overall structure of the proposed methods. The upper part illustrates the ExpLLM. By inputting both the instruction and the facial image into the ExpLLM, it sequentially generates the Expression CoT. The lower part features the Exp-CoT Engine, which is designed to construct the CoT from three perspectives: key observations, overall emotional interpretation, and conclusion. The Exp-CoT Engine utilizes the AU results to generate the expression CoT.

Abstract

Facial expression recognition (FER) is a critical task in multimedia with significant implications across various domains. However, analyzing the causes of facial expressions is essential for accurately recognizing them. Current approaches, such as those based on facial action units (AUs), typically provide AU names and intensities but lack insight into the interactions and relationships between AUs and the overall expression. In this paper, we propose a novel method called ExpLLM, which leverages large language models to generate an accurate chain of thought (CoT) for facial expression recognition. Specifically, we have designed the CoT mechanism from three key perspectives: key observations, overall emotional interpretation, and conclusion. The key observations describe the AU's name, intensity, and associated emotions. The overall emotional interpretation provides an analysis based on multiple AUs and their interactions, identifying the dominant emotions and their relationships. Finally, the conclusion presents the final expression label derived from the preceding analysis. Furthermore, we also introduce the Exp-CoT Engine, designed to construct this expression CoT and generate instruction-description data for training our ExpLLM. Extensive experiments on the RAF-DB and AffectNet datasets demonstrate that ExpLLM outperforms current state-of-the-art FER methods. ExpLLM also surpasses the latest GPT-4o in expression CoT generation, particularly in recognizing micro-expressions where GPT-4o frequently fails.

MY ALT TEXT

Comparison with state-of-the-art methods.

ExpLLM Vs. GPT-4o

Predict Samples

BibTeX

@misc{lan2024expllmchainthoughtfacial,
        title={ExpLLM: Towards Chain of Thought for Facial Expression Recognition}, 
        author={Xing Lan and Jian Xue and Ji Qi and Dongmei Jiang and Ke Lu and Tat-Seng Chua},
        year={2024},
        eprint={2409.02828},
        archivePrefix={arXiv},
        primaryClass={cs.CV},
        url={https://arxiv.org/abs/2409.02828}, 
  }