Credit Card Fraud Detection Based on Random Forest Model

<p>Peilin Li</p>

doi:10.25236/AJCIS.2022.051309

Academic Journal of Computing & Information Science, 2022, 5(13); doi: 10.25236/AJCIS.2022.051309.

Credit Card Fraud Detection Based on Random Forest Model

Author(s)

Peilin Li

Corresponding Author:

Peilin Li

Affiliation(s)

College of Art and Science, The Ohio State University, Columbus, Ohio, 43210, United State

Download PDF
|
Download: 26
|
View: 723

Abstract

This paper uses a classifier named random forest to detect credit card fraud. Credit card fraud is one of the main issues in the economic industry. To construct a credit card fraud detection system, a certain amount of samples is required. In this paper, a dataset containing 284,807 credit card transactions is used. This dataset has gone through the PCA transformation and includes 492 frauds out of 284,807 transactions. Based on the huge amount of data and imbalanced samples, this paper compresses the dataset and uses the synthetic minority over-sampling technique (SMOTE) to address the problem of imbalanced samples. Also, in this paper, we use random forest as a classification model while constructing the fraud detection system/method.

Keywords

Credit Card Fraud, Random Forest, SMOTE, Prediction Model, Data Science

Cite This Paper

Peilin Li. Credit Card Fraud Detection Based on Random Forest Model. Academic Journal of Computing & Information Science (2022), Vol. 5, Issue 13: 55-61. https://doi.org/10.25236/AJCIS.2022.051309.

References

[1] Delamaire, Linda, Hussein Abdou, and John Pointon. "Credit card fraud and detection techniques: a review." Banks and Bank systems 4.2 (2009): 57-68.

[2] “Credit Card Fraud Detection.” Google Search, Google, https://www.google. com.hk/search?q =citation%2Bgenerator%2BMLA&newwindow=1&ei=w1dGY6CZGrfN2roP9eyh0AY&ved=0ahUKEwjghqfFgdr6AhW3plYBHXV2CGoQ4dUDCA4&uact=5&oq=citation%2Bgenerator%2BMLA&gs_lp=Egdnd3Mtd2l6uAED-AEBMgUQABiABDIFEAAYgAQyBBAAGB4yBBAAGB4yBBAAGB4yBBAAGB4y BBAAGB4yBBAAGB4yBBAAGB4yBBAAGB7CAgoQABhHGNYEGLADwgIEEAAYQ5AGCkjfDVCDBFiPC3ABeAHIAQCQAQCYAYQEoAH5CaoBBzMtMS4xLjHiAwQgTRgB4gMEIEEYAOIDBCBGGACIBgE&sclient=gws-wiz.

[3] Browne, Michael W. "Cross-validation methods." Journal of mathematical psychology 44.1 (2000): 108-132.

[4] Barker, Katherine J., et al. "Credit Card Fraud: Awareness and Prevention." Journal of Financial Crime, vol. 15, no. 4, 2008, pp. 398-410. HeinOnline, https://heinonline-org.proxy.lib.ohio-state. edu/ HOL/P?h=hein.journals/jfc15&i=398.

[5] S. T. King, N. Scaife, P. Traynor, Z. Abi Din, C. Peeters and H. Venugopala, "Credit Card Fraud Is a Computer Security Problem," in IEEE Security & Privacy, vol. 19, no. 2, pp. 65-69, March-April 2021, doi: 10.1109/MSEC.2021.3050247.

[6] Bhatla, Tej Paul, Vikram Prabhu, and Amit Dua. "Understanding credit card frauds." Cards business review 1.6 (2003): 1-15.

[7] Chevers, Delroy A. "The impact of cybercrime on e-banking: A proposed model." CONF-IRM. 2019.

[8] Chawla, Nitesh V., et al. "SMOTE: synthetic minority over-sampling technique." Journal of artificial intelligence research 16 (2002): 321-357.

[9] “Smote#.” SMOTE - Version 0.9.1,https://imbalanced-learn.org/stable/references/generated/ imblearn.over_sampling.SMOTE.html.

[10] “Sklearn.model_selection.train_test_split.” Scikit, https://scikit-learn.org/stable/modules/ generated/sklearn.model_selection.train_test_split.html.

[11] S. R. Safavian and D. Landgrebe, "A survey of decision tree classifier methodology," in IEEE Transactions on Systems, Man, and Cybernetics, vol. 21, no. 3, pp. 660-674, May-June 1991, doi: 10.1109/21.97458.

[12] Shannon, Claude Elwood. "A mathematical theory of communication." The Bell system technical journal 27.3 (1948): 379-423.

[13] Rényi, Alfréd. "On measures of entropy and information." Proceedings of the fourth Berkeley symposium on mathematical statistics and probability. Vol. 1. No. 547-561. 1961.

[14] Breiman, L. (1984). Classification And Regression Trees (1st ed.). Routledge. https://doi. org/ 10.1201/9781315139470

[15] Li, Xiang, and Christophe Claramunt. "A spatial entropy‐based decision tree for classification of geographical information." Transactions in GIS 10.3 (2006): 451-467.

[16] Breiman, L. Random Forests. Machine Learning 45, 5–32 (2001). https://doi.org /10.1023/ A: 1010933404324

[17] “Sklearn.ensemble.randomforestclassifier.” Scikit, https://scikit-learn.org/stable/modules/ generated/sklearn.ensemble.RandomForestClassifier.html.

[18] Fawcett, Tom. "An introduction to ROC analysis." Pattern recognition letters 27.8 (2006): 861-874.

[19] “Sklearn.metrics.plot_roc_curve.” Scikit, https://scikit-learn.org/stable/modules/generated/ sklearn.metrics.plot_roc_curve.html.

[20] N. Seliya, T. M. Khoshgoftaar and J. Van Hulse, "A Study on the Relationships of Classifier Performance Metrics," 2009 21st IEEE International Conference on Tools with Artificial Intelligence, 2009, pp. 59-66, doi: 10.1109/ICTAI. 2009. 25.

[21] Davis, Jesse, and Mark Goadrich. "The relationship between Precision-Recall and ROC curves." Proceedings of the 23rd international conference on Machine learning. 2006.

[22] “Sklearn.metrics.precision_score.” Scikit, https://scikit-learn.org/stable/modules/ generated/ sklearn.metrics.precision_score.html.

[23] “Sklearn.metrics.recall_score.” Scikit, https://scikit-learn.org/stable/modules/generated/sklearn. metrics.recall_score.html.

[24] “Sklearn.metrics.f1_score.” Scikit, https://scikit-learn.org/stable/modules/generated/sklearn. metrics.f1_score.html.