I2C-Huelva at IberLEF-2024 DETESTS-Dis: Learning from Divergence to Identify Explicit and Implicit Racial Stereotypes in Spanish Texts

Cerrejón Naranjo, Manuel; Guerrero García, Manuel; Mata Vázquez, Jacinto; Pachón Álvarez, Victoria

I2C-Huelva at IberLEF-2024 DETESTS-Dis: Learning from Divergence to Identify Explicit and Implicit Racial Stereotypes in Spanish Texts

dc.contributor.author	Cerrejón Naranjo, Manuel
dc.contributor.author	Guerrero García, Manuel
dc.contributor.author	Mata Vázquez, Jacinto
dc.contributor.author	Pachón Álvarez, Victoria
dc.date.accessioned	2024-11-27T08:43:29Z
dc.date.available	2024-11-27T08:43:29Z
dc.date.issued	2024
dc.description	This paper was presented at the I International Workshop on Conspiracy theories and hate speech online: Comparison of patterns in narratives and social media about Covid 19, immigrants, refugees and LGTBIQ+ people. Universidad de Huelva, July 12 14, 2023 (https://eventos.uhu.es/99642/detail/i-international-workshop-nonconspirahate-project.html). Este trabajo presenta los enfoques desarrollados para detectar e identificar estereotipos raciales en textos en español utilizando técnicas avanzadas de Procesamiento del Lenguaje Natural (PLN) y Deep Learning, incorporando el Aprendizaje con Desacuerdo para mejorar la robustez. La principal contribución de este trabajo es la demostración de la eficacia de los clasificadores ensemble basados en transformadores para reconocer estereotipos tanto explícitos como implícitos. Al aprovechar los puntos fuertes de varios modelos, el método propuesto consigue mejores resultados que si se utiliza un único modelo. Además, los resultados ponen de manifiesto la importancia de seleccionar los hiperparámetros adecuados durante el proceso de entrenamiento del modelo. Mediante una experimentación y evaluación rigurosas, se identificó la combinación óptima de hiperparámetros. En nuestros experimentos, utilizamos un corpus preprocesado y anotado de textos en español y aplicamos técnicas de aumento de datos, como la retrotraducción, para equilibrar el conjunto de datos. Además, incorporamos el enfoque «Learning With Disagreement» (LeWiDi), que utiliza las discrepancias entre distintos modelos para mejorar el sistema de clasificación. Los resultados obtenidos demuestran mejoras significativas en F1-Score, subrayando la potencial aplicación de estos métodos en la moderación de contenidos en redes sociales y otras plataformas digitales. Con esta estrategia, alcanzamos el segundo puesto en la Tarea 1 utilizando un ensemble formado por 3 modelos, uno para cada anotador, basado en RoBERTa. En la Tarea 2, alcanzamos la séptima posición, utilizando el mismo enfoque. The paper is part of the I+D+i Project titled "Conspiracy Theories and Hate Speech Online: Comparison of Patterns in Narratives and social networks about COVID-19, immigrants, refugees, and LGBTI people [NON-CONSPIRA-HATE!]", PID2021-123983OB-I00, funded by MCIN/AEI/10.13039/501100011033/ and by "ERDF/EU." (https://eseis.es/investigacion/discursos-de-odio/discursos-odio-tc). We are also grateful for the support of our research group: "Estudios Sociales E Intervención Social" (GrupoESEIS), and the research center "Pensamiento Contemporáneo e Innovación para el Desarrollo Social" (COIDESO), and the Applied Computational Social Science Lab, CISCOA-Lab, at the University of Huelva.	es_ES
dc.description.abstract	This paper was presented at the I International Workshop on Conspiracy theories and hate speech online: Comparison of patterns in narratives and social media about Covid 19, immigrants, refugees and LGTBIQ+ people. Universidad de Huelva, July 12 14, 2023 (https://eventos.uhu.es/99642/detail/i-international-workshop-nonconspirahate-project.html). This paper presents the approaches developed for detecting and identifying racial stereotypes in Spanish texts using advanced Natural Language Processing (NLP) and Deep Learning techniques, incorporating Learning with Disagreement for enhanced robustness. The major contribution of this work is the demonstration of the effectiveness of transformer-based ensemble classifiers to recognize both explicit and implicit stereotypes. By leveraging the strengths of multiple models, the proposed method achieves better performance than using a single model alone. Additionally, the importance of selecting appropriate hyperparameters during the model training process was highlighted by the results. Through rigorous experimentation and evaluation, optimal hyperparameter combination where identified. In our experiments, we utilized a preprocessed and annotated corpus of Spanish texts and applied data augmentation techniques, such as back-translation, to balance the dataset. Furthermore, we incorporated the ”Learning With Disagreement” (LeWiDi) approach, which uses the discrepancies between different models to improve the classification system. The results obtained demonstrate significant improvements in F1-Score, underscoring the potential application of these methods in moderating content on social media and other digital platforms. With this strategy, we achieved second place in Task 1 using an ensemble consisting of 3 models, one for each annotator, based on RoBERTa. In Task 2, we reached the seventh position, using the same approach. The paper is part of the I+D+i Project titled "Conspiracy Theories and Hate Speech Online: Comparison of Patterns in Narratives and social networks about COVID-19, immigrants, refugees, and LGBTI people [NON-CONSPIRA-HATE!]", PID2021-123983OB-I00, funded by MCIN/AEI/10.13039/501100011033/ and by "ERDF/EU." (https://eseis.es/investigacion/discursos-de-odio/discursos-odio-tc). We are also grateful for the support of our research group: "Estudios Sociales E Intervención Social" (GrupoESEIS), and the research center "Pensamiento Contemporáneo e Innovación para el Desarrollo Social" (COIDESO), and the Applied Computational Social Science Lab, CISCOA-Lab, at the University of Huelva.	es_ES
dc.description.department	Tecnologías de la Información	es_ES
dc.description.sponsorship	Proyecto PID2021-123983OB-I0 [NON-CONSPIRA-HATE!], financiado por MCIN/AEI/10.13039/501100011033/ y por ERDF/EU.	es_ES
dc.identifier.citation	Cerrejón-Naranjo, M; Guerrero-García, M.; Mata-Vázquez, J., & Pachón-Álvarez, V. (2024). I2C-Huelva at IberLEF-2024 DETESTS-Dis: Learning from Divergence to Identify Explicit and Implicit Racial Stereotypes in Spanish Texts. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024) colocated with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), Valladolid, Spain, September 24, 2024. CEUR Workshop Proceedings 3756
dc.identifier.uri	https://hdl.handle.net/10272/24526
dc.language.iso	eng	es_ES
dc.publisher	CEUR-WS
dc.rights	Atribución-NoComercial-SinDerivadas 3.0 España	*
dc.rights.accessRights	open access	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/	*
dc.subject.other	Learning With Disagreement	es_ES
dc.subject.other	Natural Language Processing	es_ES
dc.subject.other	Deep Learning	es_ES
dc.subject.other	Hate Speech Detection	es_ES
dc.subject.other	Data Augmentation	es_ES
dc.subject.other	Transformers	es_ES
dc.subject.other	Hate Speech	es_ES
dc.subject.unesco	33 Ciencias Tecnológicas	es_ES
dc.title	I2C-Huelva at IberLEF-2024 DETESTS-Dis: Learning from Divergence to Identify Explicit and Implicit Racial Stereotypes in Spanish Texts	es_ES
dc.type	conference output	es_ES
dspace.entity.type	Publication
relation.isAuthorOfPublication	ac76819b-d91a-4158-b947-4a9e827e5e9d
relation.isAuthorOfPublication	47cb4892-3513-4d33-953c-8521bc9cb187
relation.isAuthorOfPublication.latestForDiscovery	ac76819b-d91a-4158-b947-4a9e827e5e9d

Files

Original bundle

Now showing 1 - 1 of 1

Name:: DETESTS-Dis2024_paper1.pdf
Size:: 392.76 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Ponencias, comunicaciones y pósteres