Authors
V. V. R. Maheswara Rao1 and V. Valli Kumari2, 1Shri Vishnu Engineering College for Women, India and 2Andhra University, India
Abstract
With the continued growth and proliferation of Web services and Web based information systems, the volumes of user data have reached astronomical proportions. Before analyzing such data using web mining techniques, the web log has to be pre processed, integrated and transformed. As the World Wide Web is continuously and rapidly growing, it is necessary for the web miners to utilize intelligent tools in order to find, extract, filter and evaluate the desiredinformation. The data pre-processing stage is the most important phase for investigation of the web user usage behaviour. To do this one must extract the only human user accesses from weblog data which is critical and complex. The web log is incremental in nature, thus conventional data pre-processing techniques were proved to be not suitable. Hence an extensive learning algorithm is required in order to get the desired information.This paper introduces an extensive research frame work capable of pre processing web log data completely and efficiently. The learning algorithm of proposed research frame work can separates human user and search engine accesses intelligently, with less time. In order to create suitable target data, the further essential tasks ofpre-processing Data Cleansing, User Identification, Sessionization and Path Completion are designed collectively. The framework reduces the error rate and improves significant learning performance of the algorithm. The work ensures the goodness of split by using popular measures like Entropy and Gini index. This framework helps to investigate the web user usage behaviour efficiently. The experimental results proving this claim are given in this paper.
Keywords
Web usage mining, intelligent pre-processing system, cleansing, sessionization and path completion.