Authors
Dariusz Ceglarek, Poznan School of Banking, Poland
Abstract
This work presents results of the ongoing novel research in the area of natural languageprocessing focusing on plagiarism detection, semantic networks and semantic compression. Theresults demonstrate that the semantic compression is a valuable addition to the existing methodsused in plagiary detection. The application of the semantic compression boosts the efficiency ofSentence Hashing Algorithm for Plagiarism Detection2 (SHAPD2) and authors’implementation ofthe w-shingling algorithm.Experiments were performed onClough&Stephenson corpusas well asan available PAN–PC-10plagiarism corpus used to evaluateplagiarism detection methods, so the results can be compared with other research teams.
Keywords
Plagiarism detection, Longest common subsequence, Semantic compression, Sentence hashing,w-shingling, Intellectual property protection