I2C-Huelva at IberLEF-2024 DETESTS-Dis: Learning from Divergence to Identify Explicit and Implicit Racial Stereotypes in Spanish Texts

Cerrejón Naranjo, Manuel; Guerrero García, Manuel; Mata Vázquez, Jacinto; Pachón Álvarez, Victoria

I2C-Huelva at IberLEF-2024 DETESTS-Dis: Learning from Divergence to Identify Explicit and Implicit Racial Stereotypes in Spanish Texts

Files

DETESTS-Dis2024_paper1.pdf (392.76 KB)

Publication date

2024

Authors

Cerrejón Naranjo, Manuel

Guerrero García, Manuel

Mata Vázquez, Jacinto

Pachón Álvarez, Victoria

Department

Tecnologías de la Información

Metrics

Export

URI

https://hdl.handle.net/10272/24526

Abstract

This paper was presented at the I International Workshop on Conspiracy theories and hate speech online: Comparison of patterns in narratives and social media about Covid 19, immigrants, refugees and LGTBIQ+ people. Universidad de Huelva, July 12 14, 2023 (https://eventos.uhu.es/99642/detail/i-international-workshop-nonconspirahate-project.html). This paper presents the approaches developed for detecting and identifying racial stereotypes in Spanish texts using advanced Natural Language Processing (NLP) and Deep Learning techniques, incorporating Learning with Disagreement for enhanced robustness. The major contribution of this work is the demonstration of the effectiveness of transformer-based ensemble classifiers to recognize both explicit and implicit stereotypes. By leveraging the strengths of multiple models, the proposed method achieves better performance than using a single model alone. Additionally, the importance of selecting appropriate hyperparameters during the model training process was highlighted by the results. Through rigorous experimentation and evaluation, optimal hyperparameter combination where identified. In our experiments, we utilized a preprocessed and annotated corpus of Spanish texts and applied data augmentation techniques, such as back-translation, to balance the dataset. Furthermore, we incorporated the ”Learning With Disagreement” (LeWiDi) approach, which uses the discrepancies between different models to improve the classification system. The results obtained demonstrate significant improvements in F1-Score, underscoring the potential application of these methods in moderating content on social media and other digital platforms. With this strategy, we achieved second place in Task 1 using an ensemble consisting of 3 models, one for each annotator, based on RoBERTa. In Task 2, we reached the seventh position, using the same approach. The paper is part of the I+D+i Project titled "Conspiracy Theories and Hate Speech Online: Comparison of Patterns in Narratives and social networks about COVID-19, immigrants, refugees, and LGBTI people [NON-CONSPIRA-HATE!]", PID2021-123983OB-I00, funded by MCIN/AEI/10.13039/501100011033/ and by "ERDF/EU." (https://eseis.es/investigacion/discursos-de-odio/discursos-odio-tc). We are also grateful for the support of our research group: "Estudios Sociales E Intervención Social" (GrupoESEIS), and the research center "Pensamiento Contemporáneo e Innovación para el Desarrollo Social" (COIDESO), and the Applied Computational Social Science Lab, CISCOA-Lab, at the University of Huelva.

Keywords

Learning With Disagreement; Natural Language Processing; Deep Learning; Hate Speech Detection; Data Augmentation; Transformers; Hate Speech

Unesco Subjects

33 Ciencias Tecnológicas

Bibliographic citation

Cerrejón-Naranjo, M; Guerrero-García, M.; Mata-Vázquez, J., & Pachón-Álvarez, V. (2024). I2C-Huelva at IberLEF-2024 DETESTS-Dis: Learning from Divergence to Identify Explicit and Implicit Racial Stereotypes in Spanish Texts. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024) colocated with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), Valladolid, Spain, September 24, 2024. CEUR Workshop Proceedings 3756

Collections

Ponencias, comunicaciones y pósteres

Full item page

The license for this item is described as Atribución-NoComercial-SinDerivadas 3.0 España

I2C-Huelva at IberLEF-2024 DETESTS-Dis: Learning from Divergence to Identify Explicit and Implicit Racial Stereotypes in Spanish Texts

Files

Publication date

Authors

Advisors

Department

Research group

Center

Related publication

Metrics

Export

Research Projects

Organizational Units

Journal Issue

URI

Abstract

Keywords

Unesco Subjects

Bibliographic citation

Collections