RT Dissertation/Thesis
T1 Negation and speculation detection inmedical and review texts
T1 Detección de la negación y la especulación en textos médicos y de opinión
A1 Cruz Díaz, Noa Patricia
A2 Universidad de Huelva. Departamento de Tecnologías de la Información, 
K1 Aprendizaje automático
K1 Biomedicina
AB Negation and speculation detection has been an active research area during the last years inthe Natural Language Processing community, including some Shared Tasks in relevantconferences. In fact, it constitutes a challenge in which many applications can benefit fromidentifying this kind of information (e.g., interaction detection, information extraction,sentiment analysis). This thesis aims to contribute to the ongoing research on negation andspeculation in the Language Technology community through the development of machinelearningsystems which determine the speculation and negation cues and resolve their scope(i.e., identify at sentence level which tokens are affected by the cues). It is focused on the twodomains in which negation and hedging have drawn more attention: the biomedical and thereview domains. In the first one, the proposed method improves the results to date for thesub-collection of clinical documents of the BioScope corpus. In the second, the novelty of thecontribution lies in the fact that, to the best of our knowledge, this is the first system trainedand tested on the SFU Review corpus annotated with negative and speculative information.At the same time, this is the first attempt to detect speculation in the review domain.Additionally, and due to the tokenization problems that were encountered during the preprocessingof the BioScope corpus and the small number of works in the bibliography whichpropose solutions for this problem, this thesis closely describes this issue and provide both acomprehensive overview analysis and evaluation of a set of tokenization tools. This means,the first comparative evaluation study of tokenizers in the biomedical domain which couldhelp Natural Language Processing developers to choose the best tokenizer to use.
AB La detección de la negación y la especulación ha sido un área de investigación activa en losúltimos años en la comunidad de Procesamiento del Lenguaje Natural, incluyendo algunastareas competitivas en conferencias relevantes. De hecho, muchas aplicaciones se podríanbeneficiar de la identificación precisa de este tipo de información (por ejemplo, detección deinteracciones, extracción de información, análisis de sentimientos). Esta tesis tiene comoobjetivo contribuir a la investigación en curso sobre la negación y la especulación en lacomunidad de la Tecnología del Lenguaje a través del desarrollo de sistemas de aprendizajeautomático que determinen las palabras claves de negación y especulación así comoresuelvan su ámbito lingüístico de aplicación. Entendemos por resolver el ámbito lingüístico,identificar a nivel de la frase los tokens que se ven afectados por las palabras claves. Secentra en los dos dominios en los que la negación y la especulación han recibido másatención: el biomédico y el de artículos de opinión. En el primero, el método propuestomejora los resultados hasta la fecha para la sub-colección de documentos clínicos del corpusBioscope. En el segundo, la novedad de la contribución radica en el hecho de que, hastadonde sabemos, éste es el primer sistema entrenado y evaluado en la colección de artículosde opinión Simon Fraser University anotado con información negativa y especulativa, almismo tiempo, que supone el primer intento en detectar la especulación en este dominio.Además, y debido a los problemas de tokenización encontrados durante el preprocesamientode la colección de documentos BioScope y el escaso número de estudios en labibliografía que aporten soluciones para este problema, la presente tesis describe este temaen profundidad proporcionando un análisis comprensivo así como lleva a cabo la evaluaciónde algunas herramientas de tokenización. Esta contribución supone el primer estudio deevaluación comparativo de tokenizadores en el ámbito biomédico, el cual podría ayudar a losdesarrolladores de Procesamiento del Lenguaje Natural a elegir la mejor herramienta detokenización a usar.
PB Universidad de Huelva
YR 2014
FD 2014
LK http://hdl.handle.net/10272/11442
UL http://hdl.handle.net/10272/11442
LA eng
DS Repositorio Institucional de la Universidad de Huelva
RD 30 may 2026