Detección de plagio en documentos. Sistema externo monolingüe de altas prestaciones basado en n-gramas contextuales
Loading...
Publication date
Advisors
Department
Research group
Center
Abstract
En este artículo se presenta una propuesta de sistema de detección de plagio externo
monolingüe basada en una modificación del concepto de n-grama (“n-grama contextual”), un
nuevo motor de búsqueda basado en dicho concepto, y una nueva estrategia de determinación
del plagio y sus límites (“monotonía referencial”). Los resultados de evaluación obtenidos son
comparables a los del primer clasificado en la PAN'09, aunque obtenidos con un muy inferior
coste computacional (tiempo de ejecución entre 30 y 45 minutos en un PC portátil sin uso de
programación concurrente), lo que lo convierte en una muy interesante alternativa a explotar.
In this paper a new approach is shown for a monolingual extrinsic plagiarism detection system based on a modification of the "n-gram" concept (named “contextual n-gram”), a new high performance Information Retrieval engine based on this new concept, and a new strategy (“referential monotonity”) for plagiarism detection and its limits. The assessment results can be compared with those results carried out by the winner team in PAN'09, but these are achieved with very low computational cost (results available between 30 and 45 minutes on a single laptop machine and without using concurrent programming) compared with the other existing works. Because of that, it is a very interesting proposal to exploit.
In this paper a new approach is shown for a monolingual extrinsic plagiarism detection system based on a modification of the "n-gram" concept (named “contextual n-gram”), a new high performance Information Retrieval engine based on this new concept, and a new strategy (“referential monotonity”) for plagiarism detection and its limits. The assessment results can be compared with those results carried out by the winner team in PAN'09, but these are achieved with very low computational cost (results available between 30 and 45 minutes on a single laptop machine and without using concurrent programming) compared with the other existing works. Because of that, it is a very interesting proposal to exploit.
Unesco Subjects
Bibliographic citation
Diego Antonio Rodríguez Torrejón, José Manuel Martín Ramos. Detección de plagio de documentos. Sistema externo monolingüe de altas prestaciones basado en n-gramas contextuales. Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 45, 2010, págs. 49-58.













