Authors
V.Annapoorani1 and A.Vijaya2, 1Paavai Engineering College, India and 2Govt Arts and Science College - Salem, India
Abstract
This paper depicts an idiosyncratic tool for clustering legal documents for assisting lawyers and legal groups during document review from web mining. This tool clusters the data collection and generates a set of distinctive word labels for each cluster of documents, all entirely autonomous. This tool is based on the use of PDDP and K-Means, applied to both the document and the words, using a bag of words model. All this processing by the tool occurs without input from a human user, except to specify the original document set. In this thesis, we address the name of civil cases in orderly fashion so as to retrieve rapid comprehension of the retrieved document from the web mining. Clustering is a technique used to group and divide underlying data based on their similarities and dissimilarities. It discovers both the dense and the sparse regions in a data set. Both the academic and document researchers rely on this technique. This method is unusual. It operates by repeatedly splitting clusters into smaller clusters. Though it’s frequently clustering methods, we have little knowledge about the characteristics of available clustering methods or how clustering methods should be deployed. Cluster analysis is referred to as segmentation method in document research. It means segmenting or separating a particular data from the given set of data available. In this paper, we discuss the most classical problems that occur while clustering underlying data. The first problem is which cluster must be split and the second problem is how to split the selected cluster. This paper clusters the data collection and generates a set of distinctive word labels for each cluster of documents, all entirely autonomous.
Keywords
PDDP, FORGY’S ALGORITHM, LATENT SEMANTIC INDEXING ALGORITHM(LSI), LINEAR LEASE SQUARE FIT(LLSF)ALGORITHM.