Tóm tắt:
Đánh giá tín dụng (ĐGTD) nhằm phân nhóm khách hàng tốt - xấu là một trong những nhiệm vụ quan trọng của quản trị rủi ro tại các ngân hàng và tổ chức tín dụng. Một mô hình ĐGTD tin cậy phải phát hiện đúng nhóm khách hàng xấu. Điều này thường khó đạt được khi chênh lệch số phần tử hai nhóm khách hàng tốt - xấu là lớn. Bên cạnh đó, mô hình ĐGTD cần chỉ rõ những đặc điểm quan trọng của khách hàng để dự báo khả năng vỡ nợ. Bài viết đề xuất một mô hình ĐGTD, được gọi là SMOTE-Lasso-Logistic. Áp dụng kết hợp kỹ thuật tái chọn mẫu SMOTE và phương pháp Lasso trên mô hình hồi quy Logistic, mô hình SMOTE-Lasso-Logistic có thể giải quyết những vấn đề nói trên; đồng thời hiệu quả phân lớp cao hơn các cách tiếp cận truyền thống như mô hình hồi quy Logistic và mô hình Cây phân loại.
Tài liệu tham khảo:
- Abdou, H. A., & Pointon, J. (2011). Credit Scoring, Statistical Techniques and Evaluation Criteria: a Review of the Literature. Intelligent Systems in Accounting, Finance and Management, 18(2–3), 59–88. https://doi.org/10.1002/isaf.325
- Anderson, B., & Hardin, J. M. (2014). Credit Scoring in the Age of Big Data. Encyclopedia of Business Analytics and Optimization, 148(2016), 549–557. https://doi.org/10.4018/978-1-4666-5202-6.ch049
- Baesens, B., Van Gestel, T., Viaene, S., Stepanova, M., Suykens, J., & Vanthienen, J. (2003). Benchmarking state-of-the-art classification algorithms for credit scoring. Journal of the Operational Research Society, 54(6), 627–635.
- Barandela, R., Sánchez, J. S., & Valdovinos, R. M. (2003). New Applications of Ensembles of Classifiers. Pattern Analysis and Applications, 6(3), 245–256.
- Bellotti, T., & Crook, J. (2009). Support véc tơ machines for credit scoring and discovery of significant features. Expert Systems with Applications, 36(2 PART 2), 3302–3308. https://doi.org/10.1016/j.eswa.2008.01.005
- Bensic, M., Sarlija, N., & Zekic-Susac, M. (2005). Modelling small-business credit scoring by using logistic regression, neural networks and decision trees. Intelligent Systems in Accounting, Finance and Management, 13(3), 133–150. https://doi.org/10.1002/isaf.261
- Brown, I., & Mues, C. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications, 39(3), 3446–3453. https://doi.org/10.1016/j.eswa.2011.09.033
- Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16(June), 321–357.
- Desai, V. S., Crook, J. N., & Overstreet, G. A. (1996). A comparison of neural networks and linear scoring models in the credit union environment. European Journal of Operational Research, 95(1), 24–37. https://doi.org/10.1016/0377-2217(95)00246-4
- Etheridge, H. L., & Sriram, R. S. (1997). A comparison of the relative costs of financial distress models: artificial neural networks, logit and multivariate discriminant analysis. International Journal of Intelligent Systems in Accounting, Finance & Management, 6(3), 235–248. https://doi.org/10.1002/(sici)1099-1174(199709)6:3<235::aid-isaf135>3.0.co;2-n
- Galindo, J., & Tamayo, P. (2000). Credit risk assessment using statistical and machine learning: Basic methodology and risk modeling applications. Computational Economics, 15(1–2), 107–143. https://doi.org/10.1023/a:1008699112516
- Ha, C. N. (2020). Posterior Summary of Bayes Error Using Monte-Carlo Sampling and Its Application in Credit Scoring, Asian Journal of Economics and Banking, 4(2),117-126.
- Hastie, T., Tibshirani, R., & Friedman, J. (2017). Statistical Learning with Sparsity The Lasso and Generalizations. Springer Berlin Heidelberg.
- Huang, C. L., Chen, M. C., & Wang, C. J. (2007). Credit scoring with a data mining approach based on support véc tơ machines. Expert Systems with Applications, 33(4), 847–856.
- Huang, Z., Chen, H., Hsu, C. J., Chen, W. H., & Wu, S. (2004). Credit rating analysis with support véc tơ machines and neural networks: A market comparative study. Decision Support Systems, 37(4), 543–558. https://doi.org/10.1016/S0167-9236(03)00086-1
- James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R (Vol. 102). Springer.
- Jin-Chuan Duan & Yanqi Zhu (2020). Economic Growths of ASEAN-5 Countries Impacted by Global and Domestic Credit Risks. Asian Journal of Economics and Banking, 4(2), 1-20.
- Li, Q. (2019). Logistic and SVM Credit Score Models Based on Lasso Variable Selection. 1131–1148. https://doi.org/10.4236/jamp.2019.75076
- Louzada, F., Ara, A., & Fernandes, G. B. (2016). Classification methods applied to credit scoring: Systematic review and overall comparison. Surveys in Operations Research and Management Science, 21(2), 117–134. https://doi.org/10.1016/j.sorms.2016.10.001
- Marqués, A. I., García, V., & Sánchez, J. S. (2013). On the suitability of resampling techniques for the class imbalance problem in credit scoring. Journal of the Operational Research Society, 64(7), 1060–1070. https://doi.org/10.1057/jors.2012.120
- Onay, C., & Öztürk, E. (2018). A review of credit scoring research in the age of Big Data. Journal of Financial Regulation and Compliance, 26(3), 382–405.
- Sun, J., Lang, J., Fujita, H., & Li, H. (2018). Imbalanced enterprise credit evaluation with DTE-SBD: Decision tree ensemble based on SMOTE and bagging with differentiated sampling rates. Information Sciences, 425, 76–91. https://doi.org/10.1016/j.ins.2017.10.017
- Sun, Y., Kamel, M. S., Wong, A. K. C., & Wang, Y. (2007). Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition, 40(12), 3358–3378.
- Wang, H., Xu, Q., & Zhou, L. (2015). Large unbalanced credit scoring using lasso-logistic regression ensemble. PLoS ONE, 10(2). https://doi.org/10.1371/journal.pone.0117844
- West, D. (2000). Neural network credit scoring models. Computers and Operations Research, 27(11–12), 1131–1152. https://doi.org/10.1016/S0305-0548(99)00149-5
- Wiginton, J. C. (1980). A Note on the Comparison of Logit and Discriminant Models of Consumer Credit Behavior. The Journal of Financial and Quantitative Analysis, 15(3), 757–770. https://doi.org/10.2307/2330408
- Xiao, J., Xie, L., He, C., & Jiang, X. (2012). Dynamic classifier ensemble model for customer classification with imbalanced class distribution. Expert Systems with Applications, 39(3), 3668–3675. https://doi.org/10.1016/j.eswa.2011.09.059
- Xiao, J., Zhou, X., Zhong, Y., Xie, L., Gu, X., & Liu, D. (2020). Cost-sensitive semi-supervised selective ensemble model for customer credit scoring. Knowledge-Based Systems, 189.
- Yobas, M. B., Crook, J. N., & Ross, P. (2000). Credit scoring using neural and evolutionary techniques. IMA Journal of Management Mathematics, 11(2), 111–125.
Abstract:
Credit scoring to classify good – bad borrowers is one of the important tasks of risk management at banks and credit bureaus. A reliable credit scoring model must correctly discover the bad class. This does not usually succeed if the difference of the number of good and bad borrowers is large. Besides, credit scoring model should point out the significant characteristics of borrowers to predict the probability of default. The paper proposes a credit scoring model called SMOTE-Lasso-Logistic. Applying the combination of the resampling technique SMOTE and Lasso method on Logistic regression, SMOTE-Lasso-Logistic model can solve these issues and have higher classification performance than traditional approaches such as Logistic regression and Decision tree model.