Welcome to Francis Academic Press

Academic Journal of Computing & Information Science, 2026, 9(1); doi: 10.25236/AJCIS.2026.090101.

A Review of Deep Learning in Micro-expression Recognition Research

Author(s)

Mincong Zhou1, Wei Wang2, Yixing Li1

Corresponding Author:
Wei Wang
Affiliation(s)

1School of Health Science and Engineering, University of Science and Technology, Shanghai, China

2PLA Naval Medical Center, Naval Medical University, Shanghai, China

Abstract

Micro-expressions (ME) are involuntary, fleeting facial cues that reveal hidden emotions in high-stakes situations. They provide valuable insights into an individual's true psychological state and have a wide range of applications in psychology, law enforcement, and human-computer interaction. Traditional ME recognition relies on hand-crafted features, but recent advances in deep learning have made end-to-end recognition possible, greatly accelerating research progress. This paper reviews deep learning-based micro-expression recognition (MER), covering dataset construction, pre-processing, feature enhancement, and the evolution of network architecture. It also compares single-stream, multi-stream, and multimodal fusion models, summarizes loss strategies, and discusses current challenges and future research trends. This review aims to provide a systematic perspective to promote the development and practical reliability of MER.

Keywords

Micro-expression Recognition, Feature Detection, Feature Extraction, Micro-expression dataset, Deep Learning

Cite This Paper

Mincong Zhou, Wei Wang, Yixing Li. A Review of Deep Learning in Micro-expression Recognition Research. Academic Journal of Computing & Information Science (2026), Vol. 9, Issue 1: 1-11. https://doi.org/10.25236/AJCIS.2026.090101.

References

[1] Haggard E A, Isaacs K S. Micromomentary facial expressions as indicators of ego mechanisms in psychotherapy[M]. Methods of Research in Psychotherapy. Springer, 1966: 154-165.

[2] Zhao G, Pietikainen M. Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(6): 915-928.

[3] Polikovsky S, Kameda Y, Ohta Y. Facial micro-expressions recognition using high speed camera and 3D-gradient descriptor[C].3rd International Conference on Imaging for Crime Detection and Prevention (ICDP 2009). 2009: 1-6.

[4] Shreve M, Godavarthy S, Goldgof D, et al. Macro- and micro-expression spotting in long videos using spatio-temporal strain[C].2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG). 2011: 51-56.

[5] Li X, Pfister T, Huang X, et al. A Spontaneous Micro-expression Database: Inducement, collection and baseline[C].2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). 2013: 1-6.

[6] Yan W J, Wu Q, Liu Y J, et al. CASME database: A dataset of spontaneous micro-expressions collected from neutralized faces[C].2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). 2013: 1-7.

[7] Yan W J, Li X, Wang S J, et al. CASME II: An Improved Spontaneous Micro-Expression Database and the Baseline Evaluation[J]. PLoS ONE, 2014, 9(1): e86041.

[8] Davison A K, Lansley C, Costen N, et al. SAMM: A Spontaneous Micro-Facial Movement Dataset[J]. IEEE Transactions on Affective Computing, 2018, 9(1): 116-129.

[9] Husak P, Cech J, Matas J. Spotting Facial Micro-Expressions “ In the Wild ”[C]. Proceedings of the Computer Vision Winter Workshop. Retz, Austria; 2017: 1-9.

[10] Ben X, Ren Y, Zhang J, et al. Video-Based Facial Micro-Expression Analysis: A Survey of Datasets, Features and Algorithms[J]. IEEE transactions on pattern analysis and machine intelligence, 2022, 44(9): 5826-5846.

[11] Li J, Dong Z, Lu S, et al. CAS(ME)3: A Third Generation Facial Spontaneous Micro-Expression Database With Depth Information and High Ecological Validity[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 2782-2800.

[12] Viola P, Jones M. Robust real-time face detection[C].Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001: Vol. 2. 2001: 747.

[13] Yin X, Liu X. Multi-Task Convolutional Neural Network for Pose-Invariant Face Recognition[J]. IEEE Transactions on Image Processing, 2018, 27(2): 964-975.

[14] Deng J, Guo J, Zhou Y, et al. RetinaFace: single-stage dense face localisation in the wild[A]. arXiv, 2019.

[15] Zhou E, Fan H, Cao Z, et al. Extensive facial landmark localization with coarse-to-fine convolutional network cascade[C].Proceedings of the IEEE International Conference on Computer Vision Workshops. 2013: 386-391.

[16] Cootes T F, Taylor C J, Cooper D H, et al. Active Shape Models-Their Training and Application[J]. Computer Vision and Image Understanding, 1995, 61(1): 38-59.

[17] Cootes T F, Edwards G J, Taylor C J. Active appearance models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(6): 681-685.

[18] Cristinacce D, Cootes T F. Feature Detection and Tracking with Constrained Local Models[C]. Procedings of the British Machine Vision Conference 2006. Edinburgh: British Machine Vision Association, 2006: 95.1-95.10.

[19] Zhang Z, Luo P, Loy C C, et al. Facial Landmark Detection by Deep Multi-task Learning[C].Fleet D, Pajdla T, Schiele B, et al. Computer Vision – ECCV 2014. Cham: Springer International Publishing, 2014: 94-108.

[20] Kowalski M, Naruniec J, Trzcinski T. Deep Alignment Network: A convolutional neural network for robust face alignment[A]. arXiv, 2017.

[21] Wang J, Sun K, Cheng T, et al. Deep High-Resolution Representation Learning for Visual Recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(10): 3349-3364.

[22] Flotho P, Heiss C, Steidl G, et al. Lagrangian motion magnification with double sparse optical flow decomposition[J]. Frontiers in Applied Mathematics and Statistics, 2023, 9: 1164491.

[23] Le Ngo A C, Johnston A, Phan R C W, et al. Micro-Expression Motion Magnification: Global Lagrangian vs. Local Eulerian Approaches[C].2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). 2018: 650-656.

[24] Zhou Z, Zhao G, Pietikäinen M. Towards a practical lipreading system[C].CVPR 2011. 2011: 137-144.

[25] Lecun Y, Bottou L, Bengio Y, et al. Gradient-Based Learning Applied to Document Recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.

[26] Tran D, Bourdev L, Fergus R, et al. Learning Spatiotemporal Features with 3D Convolutional Networks[A]. arXiv, 2015.

[27] Lipton Z C, Berkowitz J, Elkan C. A Critical Review of Recurrent Neural Networks for Sequence Learning[A]. arXiv, 2015.

[28] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8): 1735-1780.

[29] Vaswani A, Shazeer N, Parmar N, et al. Attention Is All You Need[A]. arXiv, 2023.

[30] Chen B, Zhang Z, Liu N, et al. Spatiotemporal Convolutional Neural Network with Convolutional Block Attention Module for Micro-Expression Recognition[J]. Information, 2020, 11(8): 380.

[31] Gajjala V R, Reddy S P T, Mukherjee S, et al. MERANet: facial micro-expression recognition using 3D residual attention network[C].Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing. New York, NY, USA: Association for Computing Machinery, 2021: 1-10.

[32] Liu S, Ren Y, Li L, et al. Micro-expression recognition based on SqueezeNet and C3D[J]. Multimedia Systems, 2022, 28(6): 2227-2236.

[33] Wang Z, Zhang K, Luo W, et al. HTNet for micro-expression recognition[J]. Neurocomputing, 2024, 602: 128196.

[34] Khor H Q, See J, Liong S T, et al. Dual-stream shallow networks for facial micro-expression recognition[C].2019 IEEE International Conference on Image Processing (ICIP). 2019: 36-40.

[35] Liong S T, Gan Y S, See J, et al. Shallow Triple Stream Three-dimensional CNN (STSTNet) for Micro-expression Recognition[C].2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019). 2019: 1-5.

[36] Li J, Wang T, Wang S J. Facial Micro-Expression Recognition Based on Deep Local-Holistic Network[J]. Applied Sciences, 2022, 12(9): 4643.

[37] Zhu G, Liu L, Hu Y, et al. Three-Stream Temporal-Shift Attention Network Based on Self-Knowledge Distillation for Micro-Expression Recognition[A]. arXiv, 2024.

[38] Zhou L, Mao Q, Xue L. Dual-inception network for cross-database micro-expression recognition[C]. 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019). 2019: 1-5.

[39] Hao H, Wang S, Ben H, et al. Hierarchical Space-Time Attention for Micro-Expression Recognition[A]. arXiv, 2024.

[40] Kline D M, Berardi V L. Revisiting squared-error and cross-entropy functions for training neural network classifiers[J]. Neural Computing & Applications, 2005, 14(4): 310-318.

[41] Wen Y, Zhang K, Li Z, et al. A Discriminative Feature Learning Approach for Deep Face Recognition[C].Leibe B, Matas J, Sebe N, et al. Computer Vision – ECCV 2016. Cham: Springer International Publishing, 2016: 499-515.