Multimodal Emotion Recognition for Empathic Virtual Agents in Mental Health Interventions

Authors

  • Marcelo Alejandro Huerta-Espinoza CICESE, Unidad Académica Tepic, Mexico
  • Ansel Y. Rodríguez-González CICESE Unidad Académica Tepic, Mexico
  • Juan Martinez-Miranda CICESE Unidad Académica Tepic, Mexico

DOI:

https://doi.org/10.4114/intartif.vol29iss77pp28-39

Keywords:

Emotion recognition in conversation, Text, Image, Deep learning, Multimodal classification, Cross-Modal fusion

Abstract

Depression and anxiety disorders affect millions of individuals globally and are commonly addressed through psychological interventions. A growing technological approach to support such treatments involves the use of embodied conversational agents that employ motivational interviewing, a method that promotes behavioral change through empathic engagement. Despite its critical role in therapeutic efficacy, empathy remains a significant challenge for virtual agents to emulate. Emotion Recognition (ER) technologies offer a potential solution by enabling agents to perceive and respond appropriately to users' emotional states. Given the inherently multimodal nature of human emotion, unimodal ER approaches often fall short in accurately interpreting affective cues. In this work, we propose a multimodal emotion recognition model that integrates verbal and non-verbal signals (text and video) using a Cross-Modal Attention fusion strategy. Trained and evaluated on the IEMOCAP dataset, our approach leverages Ekman's taxonomy of basic emotions and demonstrates superior performance over unimodal baselines across key metrics such as accuracy and F1-score. By prioritizing text as the main modality and dynamically incorporating complementary visual cues, the model proves effective in complex emotion classification tasks. The proposed model is designed for integration into an existing conversational agent aimed at supporting individuals experiencing emotional and psychological distress. Future work will involve embedding the model in the conversational agent platform for emotionally distressed users, aiming to assess its real-world impact on engagement, user experience, and perceived empathy.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Author Biographies

Ansel Y. Rodríguez-González, CICESE Unidad Académica Tepic, Mexico

He holds a Bachelor’s degree in Computer Science (2004), a Master’s degree in Mathematical Sciences (2007) from the University of Havana, and a Ph.D. in Computational Sciences (2011) from INAOE. He was a professor at UCI and a researcher at CENATAV. Since May 2018, he has been a researcher at SECIHTI (IxM) in CICESE-UAT.

He is a National Researcher, Level 1, has participated in more than 18 theoretical and applied research projects, and published over 70 articles. He has served as guest editor for five journals and chaired scientific events such as SIBIA (2020–2022) and MCPR 2023. His algorithms have been awarded in international competitions, such as PowerTAC and the "Competition on Evolutionary Computation in the Energy Domain." In 2024, he received the National Award of the Cuban Academy of Sciences.

He is a founding member of the Master’s program offered at UAT and its graduate council. He has supervised undergraduate, master’s, and doctoral students, as well as postdoctoral researchers.

Juan Martinez-Miranda, CICESE Unidad Académica Tepic, Mexico

Juan Martínez-Miranda holds a degree in Computer Systems Engineering from the Instituto Tecnológico de San Luis Potosí, a master’s degree in Artificial Intelligence from the Polytechnic University of Catalonia, and a Ph.D. in Computer Engineering from the Complutense University of Madrid. He has conducted research at the Barcelona Science Park, the Austrian Research Institute for Artificial Intelligence, and the Polytechnic University of Valencia. He is currently a senior researcher and general coordinator of the Tepic Academic Unit at CICESE (CICESE-UAT). He has participated in and served as technical lead for research projects funded by various entities, including the European Commission and CONAHCYT. He is a Level I member of Mexico’s National System of Researchers. His research interests include human-computer interaction, affective computing, and their applications in the healthcare sector. He has authored over 80 publications in scientific journals and conferences.

Downloads

Published

2025-12-08

How to Cite

Huerta-Espinoza, M. A., Rodriguez Gonzalez, A. Y., & Martinez Miranda, J. C. (2025). Multimodal Emotion Recognition for Empathic Virtual Agents in Mental Health Interventions. Inteligencia Artificial, 29(77), 28–39. https://doi.org/10.4114/intartif.vol29iss77pp28-39