Credit Card Fraud - It's detection using machine learning

Credit Card Fraud - It's detection using machine learning

Introduction

Over the past decade, with the rising digitization, the number of credit card transactions has risen significantly. This, in turn, also led to an increase in credit card fraud attempts. On a yearly basis, credit card fraud can lead to significant financial damage to individuals, institutions and economies alike. While traditional prevention methods fall short of dealing with this rise in credit card transactions, the advent of machine learning models has shown promise to aid with this problem.

Context

However, because of the large number of different machine learning models that are available, it can be difficult to decide on the right one to use. Also, it can be difficult to comprehend how a machine learning model arrived at a certain conclusion.

Objectives

The objective of the thesis was to give an overview of the models used for credit card fraud detection, how they perform at that task and how they reach a conclusion.

Methodology

A literature review was conducted, and as a result a list of the most popular models was created. The models on this list were then tested and evaluated. Additionally, a couple of less popular models were also implemented to see if their performance was comparable with the popular ones. Finally, it was identified which attributes of a dataset these models found to be most important in detecting credit card fraud.

Results

The analysis showed that the Logistic Regression, Decision Tree, Random Forest, Artificial Neural Network, Naïve Bayes, and K-Nearest Neighbor were the most popular models. These models were then implemented and their performance tested. In addition, the XGBoost and the AdaBoost models were also implemented and tested. The best performing models were the Random Forest, the XGBoost, and the Decision Tree models. Meanwhile, the Logistic Regression and the Naïve Bayes model performed at a lower level. The findings also showed that the dollar amount of the transaction was the most important attribute of a transaction for the models. This was followed by the street where the transaction took place, the product category that was purchased, and the merchant that sold the product.

Further Research

As possible areas for future research, the study identified the following topics: the list of models could be expanded to include more potentially useful models. Another topic could be to focus more on the training and tuning aspects, to find out what the absolute limits of performance are for these models. Finally, the list of models could be based on interviews with representatives from the credit card industry, to really learn what models are used and how. In the same way, it could be interesting to try to use more real-world datasets to train the models on.