We present a detailed description of an algorithm tailored to detect external plagiarism in PAN-09 competition. The algorithm is divided into three steps: a first reduction of the size of the problem by a selection of ten suspicious plagiarists using a n-gram distance on properly recoded texts. A search for matches after T9-like recoding. A "joining algorithm" that merges selected matches and is able to detect obfuscated plagiarism. The results are briefly discussed.
A Plagiarism Detection Procedure in Three Steps: Selection, Matches and "Squares" / Chiara, Basile; Benedetto, Dario; Caglioti, Emanuele; Giampaolo, Cristadoro; MIRKO DEGLI, Esposti. - 502:(2009), pp. 19-23. (Intervento presentato al convegno PAN-09 3rd Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse and 1st Internat tenutosi a San Sebastian; Spain nel 10 settembre 2009).
A Plagiarism Detection Procedure in Three Steps: Selection, Matches and "Squares"
BENEDETTO, Dario;CAGLIOTI, Emanuele;
2009
Abstract
We present a detailed description of an algorithm tailored to detect external plagiarism in PAN-09 competition. The algorithm is divided into three steps: a first reduction of the size of the problem by a selection of ten suspicious plagiarists using a n-gram distance on properly recoded texts. A search for matches after T9-like recoding. A "joining algorithm" that merges selected matches and is able to detect obfuscated plagiarism. The results are briefly discussed.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.