An Innovative Analysis of Partial Penalty on Imbalanced Data in Credit Card Fraud Prediction

Jiawei Zhang 1 , Xin Zhang 2 , Xinyin Miao 3, 1 Senior Investment Analyst, USA, 2 Data Scientist, USA, 3 Senior Data Analyst, USA; Jiawei Zhang 1 , Xin Zhang 2 , Xinyin Miao 3, 1 Senior Investment Analyst, USA, 2 Data Scientist, USA, 3 Senior Data Analyst, USA

An Innovative Analysis of Partial Penalty on Imbalanced Data in Credit Card Fraud Prediction

Authors

Jiawei Zhang ¹ , Xin Zhang ² , Xinyin Miao ³, ¹ Senior Investment Analyst, USA, ² Data Scientist, USA, ³ Senior Data Analyst, USA

Abstract

This paper provides an innovative methodology of partial penalty on machine learning models to handle the data imbalance scenario occurring in credit card fraud detection implementation. Unlike the normal over-sampling or under-sampling methodologies, partial penalty directs the machine learning model to focus on learning the minor class of target variable even when the class distribution is extremely imbalanced. Besides comparing the partial penalty approach with over-sampling and under-sampling approaches to handle data imbalance scenario, weâ€™ve implemented this new approach under five machine learning classification models, including Logistic Regression, Random Forest, kNN, Decision Tree, and Light Gradient Boosting Model. The new partial penalty approach realizes a performance of 88.35% F1 score and 98.79% AUC score with Light GBM, higher than either over-sampling or under-sampling approaches in similar articles.

Keywords

Partial Penalty, Gradient Boosting, Data Imbalance, Credit Card Fraud Detection, SMOTE

CS&IT Conference Proceedings

An Innovative Analysis of Partial Penalty on Imbalanced Data in Credit Card Fraud Prediction