Light Gradient Boosting Machine (LGBM) for Credit Card Fraud Detection in Financial Institution
Keywords:
Financial institution, fraud detection, machine learning, credit card fraudAbstract
The upsurge in fraudulent credit card deed is raising rancorous alarms on daily basis; as its globally causing multi-billion dollars’ losses and instigates derogatory imprint which affects both financial institutions and customers. To these effects, many methods have been offered by scholars in diverse applications for curbs; that yielded near perfection results. Due to Machine learning (ML) approaches challenges and imbalance class distribution in most dataset imbibed for credit card fraud detection. The approach pervades over fitting and under-fitting results, that leads to poor generalization of outcomes. Because the classifier tends to predicts only majority class. To address this issue, an optimized performance metrics designed to handle data balancing either by using re-sampling methods or data augmentation approach is seemly. In this paper, accuracy score, Matthews’s correlation coefficient (MCC), Cohen’s Kappa and F1-Scoreareselected as the evaluation metrics to analyze the outcomes of four distinct ML model classifiers of LR, RF, Isolation Forest, and three DL models of MLP, ANN and CNN in addition to the proposed LGBM organized for credit card fraud detection. Two experiments were steered in this study on the imbalance kaggle dataset utilized. While, exploiting the Google Colab integrated Jupyter notebook cloud infrastructure as the program development settings with Python programming language for modelling. The foremost experiment is on baseline model probing LR and RF. The RF model excel in performance against the LR model was selected for the second experiment on balancing class models absorbing SMOTE oversampling methods. It is inferred that the proposed LGBM offered surpass performance across seven aspects of the evaluation metrics; out of the eleven instances deployed. The model renders an accuracy scores of (96%), least error rate of (0.4%), Recall (95%), prevalence (47%), Cohen Kappa (45%), F1-score (96%) and MCC (93%) against other ML and DL models betrothed respectively.
