Deep Learning Multimodal Sarcasm Detection in Social Media Comments: The Role of Memes and Emojis

Deep Learning Multimodal Sarcasm Detection in Social Media Comments: The Role of Memes and Emojis

Authors

  • Eka Dyar Wahyuni Department of Information Systems, Universitas Pembangunan Nasional “Veteran” Jawa Timur, Surabaya, Indonesia https://orcid.org/0000-0003-2541-1474
  • Tri Lathif Mardi Suryanto Department of Information Systems, Universitas Pembangunan Nasional “Veteran” Jawa Timur, Surabaya, Indonesia https://orcid.org/0000-0001-7532-2440
  • Heidy Arviani Department of Communication Sciences, Universitas Pembangunan Nasional “Veteran” Jawa Timur, Surabaya, Indonesia https://orcid.org/0000-0001-5908-8797

DOI:

https://doi.org/10.37965/jait.2025.0699

Keywords:

emoji, deep learning, meme, sarcasm detection

Abstract

Social media has become a crucial platform for interaction, information exchange, and market analysis. Businesses and researchers rely on it for sentiment and emotion analysis, yet sarcasm detection remains a major challenge due to its ability to alter sentiment polarity. Traditional text-based analysis struggles with sarcasm as it lacks tone and facial expressions. Additionally, crucial indicators of sarcasm—repeated emojis, punctuation, and characters—are often discarded during preprocessing. To address this issue, we proposed a multimodal deep-learning approach that integrated text, emojis, and images to improve sarcasm detection. This approach preserved and transformed repeated emojis, punctuation, and characters into structured features rather than removing them. Images were processed using Optical Character Recognition (OCR) to extract text to ensure computational efficiency by excluding non-textual visual elements. Word representations were then generated using Word2Vec embeddings, which were fed into LSTM, GRU, and BiLSTM models. The study highlighted the importance of scenario-specific preprocessing and feature selection in sarcasm detection. Among the 15 models tested, LSTM–composite demonstrated stable accuracy and strong generalization (76% accuracy, 73% precision, and 82% recall). Its high computational cost made it unsuitable for large-scale deployment. On the contrary, Model 9 (i.e., BiLSTM–isRepeatedChar) could balance efficiency and predictive performance (76% accuracy, 74% precision, and 79% recall), which made it ideal for resource-limited environments.

Metrics

Metrics Loading ...

Downloads

Published

2025-04-24

How to Cite

Wahyuni, E. D., Suryanto, T. L. M., & Arviani, H. (2025). Deep Learning Multimodal Sarcasm Detection in Social Media Comments: The Role of Memes and Emojis. Journal of Artificial Intelligence and Technology. https://doi.org/10.37965/jait.2025.0699

Issue

Section

Research Articles
Loading...