Multimodal Emotion Recognition for Empathic Virtual Agents in Mental Health Interventions
DOI:
https://doi.org/10.4114/intartif.vol29iss77pp28-39Keywords:
Emotion recognition in conversation, Text, Image, Deep learning, Multimodal classification, Cross-Modal fusionAbstract
Depression and anxiety disorders affect millions of individuals globally and are commonly addressed through psychological interventions. A growing technological approach to support such treatments involves the use of embodied conversational agents that employ motivational interviewing, a method that promotes behavioral change through empathic engagement. Despite its critical role in therapeutic efficacy, empathy remains a significant challenge for virtual agents to emulate. Emotion Recognition (ER) technologies offer a potential solution by enabling agents to perceive and respond appropriately to users' emotional states. Given the inherently multimodal nature of human emotion, unimodal ER approaches often fall short in accurately interpreting affective cues. In this work, we propose a multimodal emotion recognition model that integrates verbal and non-verbal signals (text and video) using a Cross-Modal Attention fusion strategy. Trained and evaluated on the IEMOCAP dataset, our approach leverages Ekman's taxonomy of basic emotions and demonstrates superior performance over unimodal baselines across key metrics such as accuracy and F1-score. By prioritizing text as the main modality and dynamically incorporating complementary visual cues, the model proves effective in complex emotion classification tasks. The proposed model is designed for integration into an existing conversational agent aimed at supporting individuals experiencing emotional and psychological distress. Future work will involve embedding the model in the conversational agent platform for emotionally distressed users, aiming to assess its real-world impact on engagement, user experience, and perceived empathy.
Downloads
Metrics
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Iberamia & The Authors

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Open Access publishing.
Lic. under Creative Commons CC-BY-NC
Inteligencia Artificial (Ed. IBERAMIA)
ISSN: 1988-3064 (on line).
(C) IBERAMIA & The Authors

