Efficient Fraud Detection Classification: Class Imbalance and Attribute Correlations

<p>Difei Liu<sup>1</sup>, Ruiyi Sun<sup>2</sup>, Haoyang Ren<sup>3</sup></p>

doi:10.25236/FSST.2020.021115

The Frontiers of Society, Science and Technology, 2020, 2(11); doi: 10.25236/FSST.2020.021115.

Efficient Fraud Detection Classification: Class Imbalance and Attribute Correlations

Author(s)

Difei Liu¹, Ruiyi Sun², Haoyang Ren³

Corresponding Author:

Difei Liu

Affiliation(s)

1 University of Warwick, Coventry, CV4 7AL, UK

2 Australian National University Canberra, ACT, 0200

3 Beijing Normal University, Zhuhai, China

Download PDF
|
Download: 116
|
View: 2293

Abstract

Fraud detection is a specifically important issue to protect cardholders’ information from being stolen by fraudsters. By choosing proper algorithms and analyzing behavioural information of cardholders and banks, we can significantly reduce the probability of transactions being illegally manipulated. In response to possible problems in fraud analysis, this article will focus especially on tackling class imbalance problems and finding attribute correlations. Two FraudDetection datasets on Kaggle will be used to build classifiers and ananlyze the impact of different data processing techniques. Through this process, we realized recent findings of fraud detection, we got to know more about different data processing methods, and we implemented distinct types of classifiers. We confirmed the significance of class imbalance tackling and attribute correlations analyzing.

Keywords

Fraud detection, Class imbalance, Attribute correlations, Classification algorithms

Cite This Paper

Difei Liu, Ruiyi Sun, Haoyang Ren. Efficient Fraud Detection Classification: Class Imbalance and Attribute Correlations. The Frontiers of Society, Science and Technology (2020) Vol. 2 Issue 11: 96-103. https://doi.org/10.25236/FSST.2020.021115.

References

[1] Bachmann, J (2019) Credit Fraud || Dealing with Imbalanced Datasets. [Online] Available at: https://www.kaggle.com/janiobachmann/credit-fraud-dealing-with-imbalanced-datasets

[2] Sanagapti, P. (2019). Anomaly Detection – Credit Card Fraud Analysis. [Online] Kaggle.com. Available at: https://www.kaggle.com/pavansanagapati/anomaly-detection-credit-card-fraud-analysis

[3] TJ, Horan (2018) 5 Keys to Using AI and Machine Learning in Fraud Detection | FICO [Online]

[4] Available at: https://www.fico.com/blogs/5-keys-using-ai-and-machine-learning-fraud-detection

[5] Dal Pozzolo, A., Caelen, O., Johnson, R.A (2015). December. Calibrating probability with undersampling for unbalanced classification. In 2015 IEEE Symposium Series on Computational Intelligence, pp.159-166.

[6] Fiore, U., De Santis, A., Perla, F, et al (2019). Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Information Sciences, no.479, pp.448-455.

[7] Kaggle.com (2019). IEEE-CIS Fraud Detection. [online] Available at: https://www.kaggle.com/c/ieee-fraud-detection

[8] Chen, C., Liaw, A, Breiman, L (2015). Using random forest to learn imbalanced data. 2004. University of California, Berkeley.

[9] Chawla, N.V., Bowyer, K.W., Hall, L.O, et al (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, no.16, pp.321-357.

[10] Nian, R (2019). Fixed Imbalanced Datasets: An Introduction to ADASYN (with code!). [online] Medium. Available at: https://medium.com/@ruinian/an-introduction-to-adasyn-with-code-1383a5ece7aa