Tạp chí đã xuất bản
ISSN 2615-9813
ISSN (số cũ) 1859-3682

SỐ 189 | THÁNG 12/2021

Mô hình đánh giá tín dụng SMOTE- Lasso-Logistic

Bùi Thị Thiện Mỹ

Tóm tắt:

Đánh giá tín dụng (ĐGTD) nhằm phân nhóm khách hàng tốt - xấu là một trong những nhiệm vụ quan trọng của quản trị rủi ro tại các ngân hàng và tổ chức tín dụng. Một mô hình ĐGTD tin cậy phải phát hiện đúng nhóm khách hàng xấu. Điều này thường khó đạt được khi chênh lệch số phần tử hai nhóm khách hàng tốt - xấu là lớn. Bên cạnh đó, mô hình ĐGTD cần chỉ rõ những đặc điểm quan trọng của khách hàng để dự báo khả năng vỡ nợ. Bài viết đề xuất một mô hình ĐGTD, được gọi là SMOTE-Lasso-Logistic. Áp dụng kết hợp kỹ thuật tái chọn mẫu SMOTE và phương pháp Lasso trên mô hình hồi quy Logistic, mô hình SMOTE-Lasso-Logistic có thể giải quyết những vấn đề nói trên; đồng thời hiệu quả phân lớp cao hơn các cách tiếp cận truyền thống như mô hình hồi quy Logistic và mô hình Cây phân loại.


Tài liệu tham khảo:

  1.  Abdou, H. A., & Pointon, J. (2011). Credit Scoring, Statistical Techniques and Evaluation Criteria: a Review of the Literature. Intelligent Systems in Accounting, Finance and Management, 18(2–3), 59–88. https://doi.org/10.1002/isaf.325
  2. Anderson, B., & Hardin, J. M. (2014). Credit Scoring in the Age of Big Data. Encyclopedia of Business Analytics and Optimization, 148(2016), 549–557. https://doi.org/10.4018/978-1-4666-5202-6.ch049
  3. Baesens, B., Van Gestel, T., Viaene, S., Stepanova, M., Suykens, J., & Vanthienen, J. (2003). Benchmarking state-of-the-art classification algorithms for credit scoring. Journal of the Operational Research Society, 54(6), 627–635.
  4. Barandela, R., Sánchez, J. S., & Valdovinos, R. M. (2003). New Applications of Ensembles of Classifiers. Pattern Analysis and Applications, 6(3), 245–256.
  5. Bellotti, T., & Crook, J. (2009). Support véc tơ machines for credit scoring and discovery of significant features. Expert Systems with Applications, 36(2 PART 2), 3302–3308. https://doi.org/10.1016/j.eswa.2008.01.005
  6. Bensic, M., Sarlija, N., & Zekic-Susac, M. (2005). Modelling small-business credit scoring by using logistic regression, neural networks and decision trees. Intelligent Systems in Accounting, Finance and Management, 13(3), 133–150. https://doi.org/10.1002/isaf.261
  7. Brown, I., & Mues, C. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications, 39(3), 3446–3453. https://doi.org/10.1016/j.eswa.2011.09.033
  8. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16(June), 321–357.
  9. Desai, V. S., Crook, J. N., & Overstreet, G. A. (1996). A comparison of neural networks and linear scoring models in the credit union environment. European Journal of Operational Research, 95(1), 24–37. https://doi.org/10.1016/0377-2217(95)00246-4
  10. Etheridge, H. L., & Sriram, R. S. (1997). A comparison of the relative costs of financial distress models: artificial neural networks, logit and multivariate discriminant analysis. International Journal of Intelligent Systems in Accounting, Finance & Management, 6(3), 235–248. https://doi.org/10.1002/(sici)1099-1174(199709)6:3<235::aid-isaf135>3.0.co;2-n
  11. Galindo, J., & Tamayo, P. (2000). Credit risk assessment using statistical and machine learning: Basic methodology and risk modeling applications. Computational Economics, 15(1–2), 107–143. https://doi.org/10.1023/a:1008699112516
  12. Ha, C. N. (2020). Posterior Summary of Bayes Error Using Monte-Carlo Sampling and Its Application in Credit Scoring, Asian Journal of Economics and Banking, 4(2),117-126.
  13. Hastie, T., Tibshirani, R., & Friedman, J. (2017). Statistical Learning with Sparsity The Lasso and Generalizations. Springer Berlin Heidelberg.
  14. Huang, C. L., Chen, M. C., & Wang, C. J. (2007). Credit scoring with a data mining approach based on support véc tơ machines. Expert Systems with Applications, 33(4), 847–856.
  15. Huang, Z., Chen, H., Hsu, C. J., Chen, W. H., & Wu, S. (2004). Credit rating analysis with support véc tơ machines and neural networks: A market comparative study. Decision Support Systems, 37(4), 543–558. https://doi.org/10.1016/S0167-9236(03)00086-1
  16. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R (Vol. 102). Springer.
  17. Jin-Chuan Duan & Yanqi Zhu (2020). Economic Growths of ASEAN-5 Countries Impacted by Global and Domestic Credit Risks. Asian Journal of Economics and Banking, 4(2), 1-20.
  18. Li, Q. (2019). Logistic and SVM Credit Score Models Based on Lasso Variable Selection. 1131–1148. https://doi.org/10.4236/jamp.2019.75076
  19. Louzada, F., Ara, A., & Fernandes, G. B. (2016). Classification methods applied to credit scoring: Systematic review and overall comparison. Surveys in Operations Research and Management Science, 21(2), 117–134. https://doi.org/10.1016/j.sorms.2016.10.001
  20. Marqués, A. I., García, V., & Sánchez, J. S. (2013). On the suitability of resampling techniques for the class imbalance problem in credit scoring. Journal of the Operational Research Society, 64(7), 1060–1070. https://doi.org/10.1057/jors.2012.120
  21. Onay, C., & Öztürk, E. (2018). A review of credit scoring research in the age of Big Data. Journal of Financial Regulation and Compliance, 26(3), 382–405.
  22. Sun, J., Lang, J., Fujita, H., & Li, H. (2018). Imbalanced enterprise credit evaluation with DTE-SBD: Decision tree ensemble based on SMOTE and bagging with differentiated sampling rates. Information Sciences, 425, 76–91. https://doi.org/10.1016/j.ins.2017.10.017
  23. Sun, Y., Kamel, M. S., Wong, A. K. C., & Wang, Y. (2007). Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition, 40(12), 3358–3378.
  24. Wang, H., Xu, Q., & Zhou, L. (2015). Large unbalanced credit scoring using lasso-logistic regression ensemble. PLoS ONE, 10(2). https://doi.org/10.1371/journal.pone.0117844
  25. West, D. (2000). Neural network credit scoring models. Computers and Operations Research, 27(11–12), 1131–1152. https://doi.org/10.1016/S0305-0548(99)00149-5
  26. Wiginton, J. C. (1980). A Note on the Comparison of Logit and Discriminant Models of Consumer Credit Behavior. The Journal of Financial and Quantitative Analysis, 15(3), 757–770. https://doi.org/10.2307/2330408
  27. Xiao, J., Xie, L., He, C., & Jiang, X. (2012). Dynamic classifier ensemble model for customer classification with imbalanced class distribution. Expert Systems with Applications, 39(3), 3668–3675. https://doi.org/10.1016/j.eswa.2011.09.059
  28. Xiao, J., Zhou, X., Zhong, Y., Xie, L., Gu, X., & Liu, D. (2020). Cost-sensitive semi-supervised selective ensemble model for customer credit scoring. Knowledge-Based Systems, 189.
  29. Yobas, M. B., Crook, J. N., & Ross, P. (2000). Credit scoring using neural and evolutionary techniques. IMA Journal of Management Mathematics, 11(2), 111–125.

A SMOTE-Lasso-Logistic Credit Scoring Model


Credit scoring to classify good – bad borrowers is one of the important tasks of risk management at banks and credit bureaus. A reliable credit scoring model must correctly discover the bad class. This does not usually succeed if the difference of the number of good and bad borrowers is large. Besides, credit scoring model should point out the significant characteristics of borrowers to predict the probability of default. The paper proposes a credit scoring model called SMOTE-Lasso-Logistic. Applying the combination of the resampling technique SMOTE and Lasso method on Logistic regression, SMOTE-Lasso-Logistic model can solve these issues and have higher classification performance than traditional approaches such as Logistic regression and Decision tree model.