一种融合视觉和听觉信息的双模态情感识别算法

2023-08-29 11:32| 来源: 网络整理| 查看: 265

Speech signals and facial expressions are the two main ways when people express their emotions. They are also considered to be the two main modals of emotional expression，i.e.，auditory modality and visual modality. Most of the current methods of emotion recognition research rely on the use of single⁃modal information，but single modal based methods have the disadvantages of incomplete information and vulnerability to noise interference. To address the problems of emotion recognition based on single modal，this paper proposes a bi⁃modal based emotion recognition method that combines auditory modality and visual modal information. Firstly，the Convolutional Neural Network and the pre⁃trained facial expression model are used respectively. The corresponding sound features and visual features are extracted from the speech signal and the visual signal. The extracted two types of features are information fusion and compression，and the relevant information between the modes is fully mined. Finally，the recurrent neural network is used to recognize emotion recognition on the fused auditory visual bimodal features. The method can effectively capture the intrinsic association information between the auditory modality and the visual modality，thereby improve the emotion recognition performance. In this paper，the proposed bimodal identification method is validated by RECOLA dataset. The experimental results show that the model recognition effect based on bimodal is better than a single image or voice recognition model.

Keywords： affective recognition ; feature fusion ; Convolutional Neural Network ; Long Shot⁃Term Memory

【本文地址】

公司简介

联系我们