Authors
Jiawei Zhang 1 , Xin Zhang 2 , Xinyin Miao 3, 1 Senior Investment Analyst, USA, 2 Data Scientist, USA, 3 Senior Data Analyst, USA
Abstract
This paper provides an innovative methodology of partial penalty on machine learning models to handle the data imbalance scenario occurring in credit card fraud detection implementation. Unlike the normal over-sampling or under-sampling methodologies, partial penalty directs the machine learning model to focus on learning the minor class of target variable even when the class distribution is extremely imbalanced. Besides comparing the partial penalty approach with over-sampling and under-sampling approaches to handle data imbalance scenario, we’ve implemented this new approach under five machine learning classification models, including Logistic Regression, Random Forest, kNN, Decision Tree, and Light Gradient Boosting Model. The new partial penalty approach realizes a performance of 88.35% F1 score and 98.79% AUC score with Light GBM, higher than either over-sampling or under-sampling approaches in similar articles.
Keywords
Partial Penalty, Gradient Boosting, Data Imbalance, Credit Card Fraud Detection, SMOTE