Authors
Shaukat Ali Shahee and Usha Ananthakumar, Indian Institute of Technology - Bombay, India
Abstract
In many applications of data mining, class imbalance is noticed when examples in one class are overrepresented. Traditional classifiers result in poor accuracy of the minority class due to the class imbalance. Further, the presence of within class imbalance where classes are composed of multiple sub-concepts with different number of examples also affect the performance of classifier. In this paper, we propose an oversampling technique that handles between class and within class imbalance simultaneously and also takes into consideration the generalization ability in data space. The proposed method is based on two steps- performing Model Based Clustering with respect to classes to identify the sub-concepts; and then computing the separating hyperplane based on equal posterior probability between the classes. The proposed method is tested on 10 publicly available data sets and the result shows that the proposed method is statistically superior to other existing oversampling methods.
Keywords
Supervised learning, Class Imbalance, Oversampling, Posterior Distribution