Authors
S. Sendhilkumar, Nachiyar S Nandhini and G.S. Mahalakshmi, Anna University, India
Abstract
In today’s world redundancy is the most vital problem faced in almost all domains. Novelty detection is the identification of new or unknown data or signal that a machine learning system is not aware of during training. The problem becomes more intense when it comes to “Research Articles”. A method of identifying novelty at each sections of the article is highly required for determining the novel idea proposed in the research paper. Since research articles are semi-structured, detecting novelty of information from them requires more accurate systems. Topic model provides a useful means to process them and provides a simple way to analyze them. This work compares the most predominantly used topic model- Latent Dirichlet Allocation with the hierarchical Pachinko Allocation Model. The results obtained are promising towards hierarchical Pachinko Allocation Model when used for document retrieval.
Keywords
Novelty detection, Topic modeling, LDA, hPAM, Novelty score, Concept maps