Semi-Supervised Feature Selection with Bias Mitigation for SME Credit Assessment Using Alternative Data
Main Article Content
Abstract
Credit scoring for small and medium enterprises (SMEs) faces a fundamental challenge: assessing creditworthiness when traditional financial data is unavailable. This paper presents a semi-supervised feature selection framework that addresses this challenge by leveraging alternative data sources, ranging from transaction patterns to behavioral signals. We develop a graph-based approach that reduces the requirement for labeled data by 70% while improving the area under the curve (AUC) from 0.836 to 0.871, a 4.2 percentage point increase (~5% relative) compared to the best supervised baseline. The framework integrates bias mitigation techniques, which reduce the approval-rate gap by 78.9% while maintaining stable default rates across groups, without compromising model performance. Experiments on 111,579 SME loan applications across three geographic regions demonstrate that the approach scales efficiently with O (n log n) complexity and can process 500,000 applications in approximately two hours (≈131 minutes for 500k records). The practical implications are significant: financial institutions can now assess credit risk for businesses previously considered "unscorable" due to the absence of traditional credit history. This framework facilitates broader access to capital for millions of SMEs, particularly in developing economies where formal financial records are limited.
Article Details
Section
How to Cite
References
1. R. Njuguna, and K. Sowon, "A scoping review of alternative credit scoring literature," ACM SIGCAS Conference on Computing and Sustainable Societies, pp. 437-444, 2021.
2. Z. Li, Y. Tian, K. Li, F. Zhou, and W. Yang, "Reject inference in credit scoring using semi-supervised support vector machines," Expert Systems with Applications, vol. 74, pp. 105-114, 2017. doi: 10.1016/j.eswa.2017.01.011
3. S. Maldonado, C. Bravo, J. López, and J. Pérez, "Integrated framework for profit-based feature selection and SVM classification in credit scoring," Decision Support Systems, vol. 104, pp. 113-121, 2017. doi: 10.1016/j.dss.2017.10.007
4. R. Hlongwane, K. K. Ramaboa, and W. Mongwe, "Enhancing credit scoring accuracy with comprehensive evaluation of alternative data," PLoS ONE, vol. 19, no. 5, p. e0303566, 2024.
5. A. Pérez-Martín, A. Pérez-Torregrosa, A. Rabasa, and M. Vaca, "Feature selection to optimize credit banking risk evaluation decisions for the example of home equity loans," Mathematics, vol. 8, no. 11, p. 1971, 2020.
6. V. B. Djeundje, J. Crook, R. Calabrese, and M. Hamid, "Enhancing credit scoring with alternative data," Expert Systems with Applications, vol. 163, p. 113766, 2021. doi: 10.1016/j.eswa.2020.113766
7. S. Han, "Semi-supervised learning classification based on generalized additive logistic regression for corporate credit anomaly detection," IEEE Access, vol. 8, pp. 199060-199069, 2020. doi: 10.1109/access.2020.3035128
8. F. Adamba, “Effect of Digital Banking Technology on Loan Uptake in Hotels Industry in Kenya,” African Journal of Commercial Studies, vol. 4, no. 2, pp. 166–177, 2024.
9. A. Nowak, A. Ross, and C. Yencha, "Small business borrowing and peer-to-peer lending: Evidence from Lending Club," Contemporary Economic Policy, vol. 36, no. 2, pp. 318-336, 2018.
10. G. Kou, Y. Xu, Y. Peng, F. Shen, Y. Chen, K. Chang, and S. Kou, "Bankruptcy prediction for SMEs using transactional data and multiobjective feature selection," Decision Support Systems, vol. 140, p. 113429, 2021.
11. S. Maldonado, and G. Paredes, "A semi-supervised approach for reject inference in credit scoring using SVMs," Industrial Conference on Data Mining, pp. 558-571, 2010.
12. K. Liang, and J. He, "Analyzing credit risk among Chinese P2P-lending businesses by integrating text-related soft information," Electronic Commerce Research and Applications, vol. 40, p. 100947, 2020.
13. Y. Lu, L. Yang, B. Shi, J. Li, and M. Z. Abedin, "A novel framework of credit risk feature selection for SMEs during industry 4.0," Annals of Operations Research, vol. 350, no. 2, pp. 425-452, 2025.
14. P. Hájek, and V. Olej, "Credit rating modelling by kernel-based approaches with supervised and semi-supervised learning," Neural Computing and Applications, vol. 20, no. 6, pp. 761-773, 2011.
15. X. Hu and R. Caldentey, “Trust and reciprocity in firms’ capacity sharing,” Manufacturing & Service Operations Management, vol. 25, no. 4, pp. 1436–1450, 2023, doi: 10.1287/msom.2023.1203.