I2C-Huelva at IberLEF-2024 DETESTS-Dis: Learning from Divergence to Identify Explicit and Implicit Racial Stereotypes in Spanish Texts

dc.contributor.authorCerrejón Naranjo, Manuel
dc.contributor.authorGuerrero García, Manuel
dc.contributor.authorMata Vázquez, Jacinto
dc.contributor.authorPachón Álvarez, Victoria
dc.date.accessioned2024-11-27T08:43:29Z
dc.date.available2024-11-27T08:43:29Z
dc.date.issued2024
dc.descriptionThis paper was presented at the I International Workshop on Conspiracy theories and hate speech online: Comparison of patterns in narratives and social media about Covid 19, immigrants, refugees and LGTBIQ+ people. Universidad de Huelva, July 12 14, 2023 (https://eventos.uhu.es/99642/detail/i-international-workshop-nonconspirahate-project.html). Este trabajo presenta los enfoques desarrollados para detectar e identificar estereotipos raciales en textos en español utilizando técnicas avanzadas de Procesamiento del Lenguaje Natural (PLN) y Deep Learning, incorporando el Aprendizaje con Desacuerdo para mejorar la robustez. La principal contribución de este trabajo es la demostración de la eficacia de los clasificadores ensemble basados en transformadores para reconocer estereotipos tanto explícitos como implícitos. Al aprovechar los puntos fuertes de varios modelos, el método propuesto consigue mejores resultados que si se utiliza un único modelo. Además, los resultados ponen de manifiesto la importancia de seleccionar los hiperparámetros adecuados durante el proceso de entrenamiento del modelo. Mediante una experimentación y evaluación rigurosas, se identificó la combinación óptima de hiperparámetros. En nuestros experimentos, utilizamos un corpus preprocesado y anotado de textos en español y aplicamos técnicas de aumento de datos, como la retrotraducción, para equilibrar el conjunto de datos. Además, incorporamos el enfoque «Learning With Disagreement» (LeWiDi), que utiliza las discrepancias entre distintos modelos para mejorar el sistema de clasificación. Los resultados obtenidos demuestran mejoras significativas en F1-Score, subrayando la potencial aplicación de estos métodos en la moderación de contenidos en redes sociales y otras plataformas digitales. Con esta estrategia, alcanzamos el segundo puesto en la Tarea 1 utilizando un ensemble formado por 3 modelos, uno para cada anotador, basado en RoBERTa. En la Tarea 2, alcanzamos la séptima posición, utilizando el mismo enfoque. The paper is part of the I+D+i Project titled "Conspiracy Theories and Hate Speech Online: Comparison of Patterns in Narratives and social networks about COVID-19, immigrants, refugees, and LGBTI people [NON-CONSPIRA-HATE!]", PID2021-123983OB-I00, funded by MCIN/AEI/10.13039/501100011033/ and by "ERDF/EU." (https://eseis.es/investigacion/discursos-de-odio/discursos-odio-tc). We are also grateful for the support of our research group: "Estudios Sociales E Intervención Social" (GrupoESEIS), and the research center "Pensamiento Contemporáneo e Innovación para el Desarrollo Social" (COIDESO), and the Applied Computational Social Science Lab, CISCOA-Lab, at the University of Huelva.es_ES
dc.description.abstractThis paper was presented at the I International Workshop on Conspiracy theories and hate speech online: Comparison of patterns in narratives and social media about Covid 19, immigrants, refugees and LGTBIQ+ people. Universidad de Huelva, July 12 14, 2023 (https://eventos.uhu.es/99642/detail/i-international-workshop-nonconspirahate-project.html). This paper presents the approaches developed for detecting and identifying racial stereotypes in Spanish texts using advanced Natural Language Processing (NLP) and Deep Learning techniques, incorporating Learning with Disagreement for enhanced robustness. The major contribution of this work is the demonstration of the effectiveness of transformer-based ensemble classifiers to recognize both explicit and implicit stereotypes. By leveraging the strengths of multiple models, the proposed method achieves better performance than using a single model alone. Additionally, the importance of selecting appropriate hyperparameters during the model training process was highlighted by the results. Through rigorous experimentation and evaluation, optimal hyperparameter combination where identified. In our experiments, we utilized a preprocessed and annotated corpus of Spanish texts and applied data augmentation techniques, such as back-translation, to balance the dataset. Furthermore, we incorporated the ”Learning With Disagreement” (LeWiDi) approach, which uses the discrepancies between different models to improve the classification system. The results obtained demonstrate significant improvements in F1-Score, underscoring the potential application of these methods in moderating content on social media and other digital platforms. With this strategy, we achieved second place in Task 1 using an ensemble consisting of 3 models, one for each annotator, based on RoBERTa. In Task 2, we reached the seventh position, using the same approach. The paper is part of the I+D+i Project titled "Conspiracy Theories and Hate Speech Online: Comparison of Patterns in Narratives and social networks about COVID-19, immigrants, refugees, and LGBTI people [NON-CONSPIRA-HATE!]", PID2021-123983OB-I00, funded by MCIN/AEI/10.13039/501100011033/ and by "ERDF/EU." (https://eseis.es/investigacion/discursos-de-odio/discursos-odio-tc). We are also grateful for the support of our research group: "Estudios Sociales E Intervención Social" (GrupoESEIS), and the research center "Pensamiento Contemporáneo e Innovación para el Desarrollo Social" (COIDESO), and the Applied Computational Social Science Lab, CISCOA-Lab, at the University of Huelva.es_ES
dc.description.departmentTecnologías de la Informaciónes_ES
dc.description.sponsorshipProyecto PID2021-123983OB-I0 [NON-CONSPIRA-HATE!], financiado por MCIN/AEI/10.13039/501100011033/ y por ERDF/EU.es_ES
dc.identifier.citationCerrejón-Naranjo, M; Guerrero-García, M.; Mata-Vázquez, J., & Pachón-Álvarez, V. (2024). I2C-Huelva at IberLEF-2024 DETESTS-Dis: Learning from Divergence to Identify Explicit and Implicit Racial Stereotypes in Spanish Texts. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024) colocated with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), Valladolid, Spain, September 24, 2024. CEUR Workshop Proceedings 3756
dc.identifier.urihttps://hdl.handle.net/10272/24526
dc.language.isoenges_ES
dc.publisherCEUR-WS
dc.rightsAtribución-NoComercial-SinDerivadas 3.0 España*
dc.rights.accessRightsopen accesses_ES
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/*
dc.subject.otherLearning With Disagreementes_ES
dc.subject.otherNatural Language Processinges_ES
dc.subject.otherDeep Learninges_ES
dc.subject.otherHate Speech Detectiones_ES
dc.subject.otherData Augmentationes_ES
dc.subject.otherTransformerses_ES
dc.subject.otherHate Speeches_ES
dc.subject.unesco33 Ciencias Tecnológicases_ES
dc.titleI2C-Huelva at IberLEF-2024 DETESTS-Dis: Learning from Divergence to Identify Explicit and Implicit Racial Stereotypes in Spanish Textses_ES
dc.typeconference outputes_ES
dspace.entity.typePublication
relation.isAuthorOfPublicationac76819b-d91a-4158-b947-4a9e827e5e9d
relation.isAuthorOfPublication47cb4892-3513-4d33-953c-8521bc9cb187
relation.isAuthorOfPublication.latestForDiscoveryac76819b-d91a-4158-b947-4a9e827e5e9d

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
DETESTS-Dis2024_paper1.pdf
Size:
392.76 KB
Format:
Adobe Portable Document Format
Description: