Academic Journal of Computing & Information Science, 2025, 8(4); doi: 10.25236/AJCIS.2025.080407.
Zhongyuan Xu
Goizueta Business School, Master of Finance, Emory University, Atlanta, United States
Existing credit scoring systems often face problems such as low data missing coverage and poor dynamics when relying on traditional financial data (such as credit records and bank statements), making it difficult to accurately characterize the credit risk characteristics of new user groups. To this end, this paper applies the fusion method of artificial intelligence and the Internet of Things to construct an unstructured multidimensional feature system using alternative data from users on mobile devices and the Internet environment. Principal component analysis (PCA) is used to reduce the dimensionality of high-dimensional alternative data and effectively extract the main axis of information. Subsequently, the principal component variables are used as input to construct a credit scoring model based on logistic regression and random forest. The experimental results show that in the model after the application of PCA, the accuracy of credit risk identification decreases from 84% of the original standardized features to 82%; the AUC (Area Under Curve) value decreases from 0.87 to 0.85; the K-S value decreases from 0.52 to 0.48, which verifies the feasibility and effectiveness of the fusion of alternative data and machine learning technology in credit assessment. In addition, through principal component analysis, the first two principal components are retained, explaining 56.3% and 28.7% of the data variance, respectively, and the cumulative contribution rate reaches 85%, which effectively reduces the redundancy of data and improves the training efficiency and performance of the model.
Alternative Data Source; Credit Score; Principal Component Analysis; Machine Learning; Dimension Reduction Technology
Zhongyuan Xu. Credit Scoring Using Alternative Data Sources: A Machine Learning Approach. Academic Journal of Computing & Information Science (2025), Vol. 8, Issue 4: 56-64. https://doi.org/10.25236/AJCIS.2025.080407.
[1] Chatterjee S, Corbae D, Dempsey K, et al. A quantitative theory of the credit score[J]. Econometrica, 2023, 91(5): 1803-1840.
[2] Packin N G, Lev‐Aretz Y. Decentralized credit scoring: Black box 3.0[J]. American Business Law Journal, 2024, 61(2): 91-111.
[3] Roy P K, Shaw K. A credit scoring model for SMEs using AHP and TOPSIS[J]. International Journal of Finance & Economics, 2023, 28(1): 372-391.
[4] Johnen C, Mußhoff O. Digital credit and the gender gap in financial inclusion: Empirical evidence from Kenya[J]. Journal of International Development, 2023, 35(2): 272-295.
[5] Hau H, Huang Y, Lin C, et al. FinTech credit and entrepreneurial growth[J]. the Journal of Finance, 2024, 79(5): 3309-3359.
[6] Chava S, Oettl A, Singh M. Does a one-size-fits-all minimum wage cause financial stress for small businesses?[J]. Management Science, 2023, 69(11): 7095-7117.
[7] Goel A, Rastogi S. Understanding the impact of borrowers' behavioural and psychological traits on credit default: review and conceptual model[J]. Review of Behavioral Finance, 2023, 15(2): 205-223.
[8] Powell R, Do A, Gengatharen D, et al. The relationship between responsible financial behaviours and financial wellbeing: The case of buy‐now‐pay‐later[J]. Accounting & Finance, 2023, 63(4): 4431-4451.
[9] Okero E O, Waweru F W. Credit Risk Assessment and Loan Repayment among Development Financial Institutions. A Case of Kenya Industrial Estates Limited[J]. International Journal of Finance and Accounting, 2023, 2(1): 21-29.
[10] Mushafiq M, Sindhu M I, Sohail M K. Financial performance under influence of credit risk in non-financial firms: evidence from Pakistan[J]. Journal of Economic and Administrative Sciences, 2023, 39(1): 25-42.
[11] Lainez N, Gardner J. Algorithmic credit scoring in Vietnam: a legal proposal for maximizing benefits and minimizing risks[J]. Asian journal of law and society, 2023, 10(3): 401-432.
[12] Yang J. Research on the Application of Medical Text Matching Technology Combined with Twin Network and Knowledge Distillation in Online Consultation[J]. Frontiers in Medical Science Research, 2024, 6(11): 25-29.
[13] Yang J. Research on the Strategy of MedKGGPT Model in Improving the Interpretability and Security of Large Language Models in the Medical Field[J]. Academic Journal of Medicine & Health Sciences, 2024, 5(9): 40-45.
[14] Yang J. Application of Multi-model Fusion Deep NLP System in Classification of Brain Tumor Follow-Up Image Reports[C]. The International Conference on Cyber Security Intelligence and Analytics. Cham: Springer Nature Switzerland, 2024: 380-390.
[15] Shi C. DNA Microarray Technology Principles and Applications in Genetic Research. Computer Life, 2024, 12(3): 19-24.