Evaluation of the SHAPD2 Algorithm Efficiency in Plagiarism Detection Task Using PAN Plagiarism Corpus

Dariusz Ceglarek, Poznan School of Banking, Poland; Dariusz Ceglarek, Poznan School of Banking, Poland

Evaluation of the SHAPD2 Algorithm Efficiency in Plagiarism Detection Task Using PAN Plagiarism Corpus

Authors

Dariusz Ceglarek, Poznan School of Banking, Poland

Abstract

This work presents results of the ongoing novel research in the area of natural languageprocessing focusing on plagiarism detection, semantic networks and semantic compression. Theresults demonstrate that the semantic compression is a valuable addition to the existing methodsused in plagiary detection. The application of the semantic compression boosts the efficiency ofSentence Hashing Algorithm for Plagiarism Detection2 (SHAPD2) and authors’implementation ofthe w-shingling algorithm.Experiments were performed onClough&Stephenson corpusas well asan available PAN–PC-10plagiarism corpus used to evaluateplagiarism detection methods, so the results can be compared with other research teams.

Keywords

Plagiarism detection, Longest common subsequence, Semantic compression, Sentence hashing,w-shingling, Intellectual property protection

CS&IT Conference Proceedings

Evaluation of the SHAPD2 Algorithm Efficiency in Plagiarism Detection Task Using PAN Plagiarism Corpus