TY - GEN
T1 - RUHate-MM
T2 - 33rd ACM Web Conference, WWW 2024
AU - Thapa, Surendrabikram
AU - Jafri, Farhan Ahmad
AU - Rauniyar, Kritesh
AU - Nasim, Mehwish
AU - Naseem, Usman
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/5/13
Y1 - 2024/5/13
N2 - During the conflict between Ukraine and Russia, hate speech targeted toward specific groups was widespread on different social media platforms. With most social platforms allowing multimodal content, the use of multimodal content to express hate speech is widespread on the Internet. Although there has been considerable research in detecting hate speech within unimodal content, the investigation into multimodal content remains insufficient. The limited availability of annotated multimodal datasets further restricts our ability to explore new methods to interpret and identify hate speech and its targets. The availability of annotated datasets for hate speech detection during political events, such as invasions, are even limited. To fill this gap, we introduce a comprehensive multimodal dataset consisting of 20,675 posts related to the Russia-Ukraine crisis, which were manually annotated as either ‘Hate Speech’ or ‘No Hate Speech’. Additionally, we categorize the hate speech data into three targets: ‘Individual’, ‘Organization’, and ‘Community’. Our benchmarked evaluations show that there is still room for improvement in accurately identifying hate speech and its targets. We hope that the availability of this dataset and the evaluations performed on it will encourage the development of new methods for identifying hate speech and its targets during political events like invasions and wars. The dataset and resources are made available at https://github.com/Farhan-jafri/Russia-Ukraine.
AB - During the conflict between Ukraine and Russia, hate speech targeted toward specific groups was widespread on different social media platforms. With most social platforms allowing multimodal content, the use of multimodal content to express hate speech is widespread on the Internet. Although there has been considerable research in detecting hate speech within unimodal content, the investigation into multimodal content remains insufficient. The limited availability of annotated multimodal datasets further restricts our ability to explore new methods to interpret and identify hate speech and its targets. The availability of annotated datasets for hate speech detection during political events, such as invasions, are even limited. To fill this gap, we introduce a comprehensive multimodal dataset consisting of 20,675 posts related to the Russia-Ukraine crisis, which were manually annotated as either ‘Hate Speech’ or ‘No Hate Speech’. Additionally, we categorize the hate speech data into three targets: ‘Individual’, ‘Organization’, and ‘Community’. Our benchmarked evaluations show that there is still room for improvement in accurately identifying hate speech and its targets. We hope that the availability of this dataset and the evaluations performed on it will encourage the development of new methods for identifying hate speech and its targets during political events like invasions and wars. The dataset and resources are made available at https://github.com/Farhan-jafri/Russia-Ukraine.
KW - Content Moderation
KW - Hate Speech
KW - Multimodal Data
KW - Russia-Ukraine Crisis
UR - http://www.scopus.com/inward/record.url?scp=85194483871&partnerID=8YFLogxK
U2 - 10.1145/3589335.3651973
DO - 10.1145/3589335.3651973
M3 - Conference paper
AN - SCOPUS:85194483871
T3 - WWW 2024 Companion - Companion Proceedings of the ACM Web Conference
SP - 1854
EP - 1863
BT - WWW 2024 Companion - Companion Proceedings of the ACM Web Conference
PB - Association for Computing Machinery (ACM)
Y2 - 13 May 2024 through 17 May 2024
ER -