Welcome to Francis Academic Press

Academic Journal of Computing & Information Science, 2023, 6(6); doi: 10.25236/AJCIS.2023.060619.

Multimodal emotion recognition based on fusion with residual connection

Author(s)

Ziheng Gao1, Haoxia Guo2

Corresponding Author:
Ziheng Gao
Affiliation(s)

1Faculty of Science, Guilin University of Technology, Guilin, 541000, China

2Faculty of Mechatronic Engineering, Lanzhou University of Technology, Lanzhou, 730000, China

Abstract

To improve the accuracy of emotional identification, a multi-mode fusion emotional recognition method of two-way long and short-term memory network (Bi-LSTM), Multi-Head Attention and Residual Connection blended emotional identification methods. This method performs long-term memory through LSTM, then uses the Attention mechanism to screen out important information, and finally improves the ability of network information transmission through the residual connection. Through this method of concentrated verification, the accuracy rate of emotional classification reaches 61.7%. The experimental results show that compared with models such as CNN, CMN, BC-LSTM, the accuracy and F1 value of the proposed model are effectively improved.

Keywords

Multi-modal Emotion Recognition, LSTM, Attention mechanism, Residual connection

Cite This Paper

Ziheng Gao, Haoxia Guo. Multimodal emotion recognition based on fusion with residual connection. Academic Journal of Computing & Information Science (2023), Vol. 6, Issue 6: 121-126. https://doi.org/10.25236/AJCIS.2023.060619.

References

[1] Xu Heng, Zhang Menglu, Zhong Zhen. The present situation of research in the field of comment mining and affective analysis in our country [J]. Journal of Jilin Normal University science, 2020. 41(03): 51-61. 

[2] Chen Caihua. Tri-modal Mandarin emotion recognition based on speech, expression and gesture [J]. Control Engineering, 2020, 27(11): 2023-2029. 

[3] Ying R K, Shou Y, Liu C. Prediction Model of Dow Jones Index Based on LSTM-Adaboost[C]//2021 International Conference on Communications, Information System and Computer Engineering (CISCE). IEEE, 2021: 808-812. 

[4] Shou Y, Meng T, Ai W, et al. Conversational emotion recognition studies based on graph convolutional neural networks and a dependent syntactic analysis[J]. Neurocomputing, 2022, 501: 629-639. 

[5] Meng T, Shou Y, Ai W, et al. A Multi-Message Passing Framework Based on Heterogeneous Graphs in Conversational Emotion Recognition [J]. Available at SSRN 4353605, 2021. 

[6] Yoon S, Byun S, K Jung. Multimodal speech emotion recognition using audio and text [J]. 2018 IEEE Spoken Language Technology Workshop (SLT), 2018: 112-118. 

[7] Xie Z, Guan L. Multimodal Information Fusion of Audio Emotion Recognition Based on Kernel Entropy Component Analysis[J]. international journal of semantic computing, 2012.

[8] Wang Lanxin, Wang Weiya, Cheng Xin. Bi-modal emotion recognition model [J] based on Bi-LSTM-CNN. Computer Engineering and applications, 2022, 58(4): 192-197. 

[9] Fu Z, Liu F, Wang H, et al. A cross-modal fusion network based on self-attention and residual structure for multimodal emotion recognition[J]. 2021.

[10] Shou Y, Meng T, Ai W, et al. Object Detection in Medical Images Based on Hierarchical Transformer and Mask Mechanism[J]. Computational Intelligence and Neuroscience, 2022.