Welcome to Francis Academic Press

Academic Journal of Business & Management, 2021, 3(8); doi: 10.25236/AJBM.2021.030810.

Financial Credit Default Forecast Based on Big Data Analysis


Huihui Jin1, Longyin Luo1, Xinyi Wang2, Xiaoqi Zhu3, Lian Qian4, Zhice Zhang4

Corresponding Author:
Huihui Jin

1Wenzhou-Kean University, Wenzhou, Zhejiang Province, China 

2Sichuan University, Chengdu, Sichuan, China

3Australian National University, ACT, Australia

4The Affiliated High School to Hangzhou Normal University, Hangzhou, Zhejiang, China

These authors contributed equally to this work


How to effectively evaluate and identify the potential default risk of borrowers and calculate the default probability of borrowers before issuing loans is the basis and important link of the credit risk management of modern financial institutions. This paper mainly studies the statistical analysis of historical loan data of banks and other financial institutions with the help of the idea of non-balanced data classification, and uses machine learning algorithms (not statistical algorithms) such as random forest, logical regression and decision tree to establish loan default prediction model. The experimental results show that neural network and random forest algorithm outperform decision tree and logistic regression classification algorithm in prediction performance. In addition, by using the random forest algorithm to rank the importance of features, the features that have a greater impact on the final default can be obtained, so as to make a more effective judgment on the loan risk in the financial field. 


Random Forest, Bank Credit, Loan Default Prediction, Data Mining

Cite This Paper

Huihui Jin, Longyin Luo, Xinyi Wang, Xiaoqi Zhu, Lian Qian, Zhice Zhang. Financial Credit Default Forecast Based on Big Data Analysis. Academic Journal of Business & Management (2021) Vol. 3, Issue 8: 51-56. https://doi.org/10.25236/AJBM.2021.030810.


[1] Brown, I., & Mues, C. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications, 39(3), 3446-3453.

[2] Gao Jiawei, Liang Jiye. (2008). Research progress on classification of unbalanced data sets (Doctoral dissertation).

[3] Lin, W., Wu, Z., Lin, L., Wen, A., & Li, J. (2017). An ensemble random forest algorithm for insurance big data analysis. IEEE access, 5, 16568-16575.

[4] Lu Hongyan, & Feng Qian. (2019). Review of random forest algorithm. Journal of Hebei Academy of Sciences, 3

[5] Khashman, A. (2011), Credit Risk Evaluation Using Neural Networks: Emotional versus Conventional Models; Applied Soft Computing, 11, pp.5477-5484.

[6] Oshiro, T. M., Perez, P. S., & Baranauskas, J. A. (2012, July). How many trees in a random forest?. In International workshop on machine learning and data mining in pattern recognition (pp. 154-168). Springer, Berlin, Heidelberg.

[7] Python Software Foundation (2018).Python Language Reference, version3.5. http://www.python.org

[8] Shen Chu (2019). The method of non-equilibrium data classification based on the generation model and Its Application Research (master's thesis, Hebei University)

[9] Wei Zhengtao, Yang Youlong, & Bai Jing. (2018). Improvement of random forest classification algorithm based on unbalanced data. Journal of Chongqing University, 41 (4), 54-62

[10] Zhu, L., Qiu, D., Ergu, D., Ying, C., & Liu, K. (2019). A study on predicting loan default based on the random forest algorithm. Procedia Computer Science, 162, 503-513.