I2C-UHU at IberLEF-2023 HOMO-MEX task: Ensembling Transformers Models to Identify and Classify Hate Messages Towards the Community LGBTQ+

Morano Moriña, AntonioRomán Pásaro, JavierMata Vázquez, JacintoPachón Álvarez, Victoria2024-11-272024-11-272023Morano-Moriña, J; Román-Pásaro, J.; Mata-Vázquez, J., & Pachón-Álvarez, V. (2024). I2C-UHU at IberLEF-2023 HOMO-MEX task: Ensembling Transformers Models to Identify and Classify Hate Messages Towards the Community LGBTQ+. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2023) colocated with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2023), Jaén, Spain, September 26, 2023. CEUR Workshop Proceedings 3496https://hdl.handle.net/10272/24524This paper was presented at the I International Workshop on Conspiracy theories and hate speech online: Comparison of patterns in narratives and social media about Covid 19, immigrants, refugees and LGTBIQ+ people. Universidad de Huelva, July 12 14, 2023 (https://eventos.uhu.es/99642/detail/i-international-workshop-nonconspirahate-project.html). Este artículo presenta los enfoques propuestos por el Grupo I2C para abordar la tarea HOMO-MEX de IberLef-2023: Detección de discursos de odio en mensajes en línea dirigidos a la población LGBTQ+ hispanohablante de México. La principal contribución ha sido la demostración de la eficacia de utilizar un conjunto de clasificadores basados en transformadores. Al combinar varios modelos, se aprovecharon los puntos fuertes individuales, dando como resultado un mejor rendimiento en comparación con el uso de un único modelo. Además, la importancia de seleccionar los hiperparámetros adecuados durante el proceso de entrenamiento del modelo. resultados. Mediante una meticulosa experimentación y evaluación de distintas combinaciones de hiperparámetros, se identificaron los ajustes que alcanzaron el mejor rendimiento para las tareas en cuestión. En nuestros experimentos para ambas tareas hemos probado varios modelos y decidimos ensamblar los tres modelos que proporcionaron la mejor puntuación F1 para este conjunto de datos. Además, para la Tarea 2 decidimos entrenar clasificadores binarios individuales para cada clase en lugar de hacer un clasificador multietiqueta. El modelo presentado para la Tarea 1 alcanzó una puntuación F1 del 83,25%, situándose en el 6º puesto de la competición. El modelo para la Tarea 2 alcanzó una puntuación F1 de 69,60%, situándose en el primer puesto de la competición. The paper is part of the I+D+i Project titled "Conspiracy Theories and Hate Speech Online: Comparison of Patterns in Narratives and social networks about COVID-19, immigrants, refugees, and LGBTI people [NON-CONSPIRA-HATE!]", PID2021-123983OB-I00, funded by MCIN/AEI/10.13039/501100011033/ and by "ERDF/EU." (https://eseis.es/investigacion/discursos-de-odio/discursos-odio-tc). We are also grateful for the support of our research group: "Estudios Sociales E Intervención Social" (GrupoESEIS), and the research center "Pensamiento Contemporáneo e Innovación para el Desarrollo Social" (COIDESO), and the Applied Computational Social Science Lab, CISCOA-Lab, at the University of Huelva.This paper was presented at the I International Workshop on Conspiracy theories and hate speech online: Comparison of patterns in narratives and social media about Covid 19, immigrants, refugees and LGTBIQ+ people. Universidad de Huelva, July 12 14, 2023 (https://eventos.uhu.es/99642/detail/i-international-workshop-nonconspirahate-project.html). This paper presents the approaches proposed for I2C Group to address the IberLef-2023 Task HOMO-MEX: Hate speech detection in Online Messages directed tOwards the MEXican spanish speaking LGBTQ+ population. The major contribution has been the demonstration of the effectiveness of using an ensemble of classifiers based on transformers. By combining multiple models, the individual strengths were leveraged, resulting in improved performance compared to using a single model. Furthermore, the significance of selecting appropriate hyperparameters during the model training process was underscored by the results. Through meticulous experimentation and evaluation of different hyperparameter combinations, the settings that reached the best performance for the given tasks were identified. In our experiments for both tasks we have tested several models and decided to ensemble the three models that provided the best F1-Score for this dataset. Additionally, for Task 2 we decided to train individual binary classifiers for each class instead of making a multilabel classifier. The model submitted for Task 1 achieved a F1-Score of 83,25%, ranking in the 6th place of the competition. The model for the Task 2 reached a F1-Score of 69,60%, ranking in the 1st place of the competition. The paper is part of the I+D+i Project titled "Conspiracy Theories and Hate Speech Online: Comparison of Patterns in Narratives and social networks about COVID-19, immigrants, refugees, and LGBTI people [NON-CONSPIRA-HATE!]", PID2021-123983OB-I00, funded by MCIN/AEI/10.13039/501100011033/ and by "ERDF/EU." (https://eseis.es/investigacion/discursos-de-odio/discursos-odio-tc). We are also grateful for the support of our research group: "Estudios Sociales E Intervención Social" (GrupoESEIS), and the research center "Pensamiento Contemporáneo e Innovación para el Desarrollo Social" (COIDESO), and the Applied Computational Social Science Lab, CISCOA-Lab, at the University of Huelva.engAtribución-NoComercial-SinDerivadas 3.0 Españahttp://creativecommons.org/licenses/by-nc-nd/3.0/es/Deep LearningTransformersEnsemblerHypermarameterTwitterLGBT-PhobiaHate Speech DetectionNatural Language ProcessingAprendizaje profundoI2C-UHU at IberLEF-2023 HOMO-MEX task: Ensembling Transformers Models to Identify and Classify Hate Messages Towards the Community LGBTQ+conference outputopen access33 Ciencias Tecnológicas