Authors
Rakhi Chakraborty, Global Institute Of Management and Technology, India
Abstract
On-line text documents rapidly increase in size with the growth of World Wide Web. To manage such a huge amount of texts,several text miningapplications came into existence. Those applications such as search engine, text categorization, summarization, and topic detection are based on feature extraction.It is extremely time consuming and difficult task to extract keyword or feature manually.So an automated process that extracts keywords or features needs to be established.This paper proposes a new domain keyword extraction technique that includes a new weighting method on the base of the conventional TF•IDF. Term frequency-Inverse document frequency is widely used to express the documentsfeature weight, which can’t reflect the division of terms in the document, and then can’t reflect the significance degree and the difference between categories. This paper proposes a new weighting method to which a new weight is added to express the differences between domains on the base of original TF•IDF.The extracted feature can represent the content of the text better and has a better distinguished ability.
Keywords
Text mining,Feature extraction,weighting method, Term Frequency Inverse Document Frequency (TF•IDF), Domain keyword extraction.