Đăng bài

2004

SỐ 1

2005

SỐ 2

SỐ 3

SỐ 4

SỐ 5

SỐ 6

SỐ 7

2006

SỐ 8

SỐ 9

SỐ 10

SỐ 11

SỐ 12

SỐ 13

2007

SỐ 14

SỐ 15

SỐ 16

SỐ 17

SỐ 18

SỐ 19

SỐ 20

SỐ 21

2008

SỐ 22+23

SỐ 24

SỐ 25

SỐ 26

SỐ 27

SỐ 28

SỐ 29

SỐ 30

SỐ 31

SỐ 32

SỐ 33

2009

SỐ 34+35

SỐ 36

SỐ 37

SỐ 38

SÔ 39

SỐ 40

SỐ 41

SỐ 42

SỐ 43

SỐ 44

SỐ 45

2010

SỐ 46+47

SỐ 48

SỐ 49

SỐ 50

SỐ 51

SỐ 52

SỐ 53

SỐ 54

SỐ 55

SỐ 56

SỐ 57

2011

SỐ 58+59

SỐ 60

SỐ 61

SỐ 62

SỐ 63

SỐ 64

SỐ 65

SỐ 66

SỐ 67

SỐ 68

SỐ 69

2012

SỐ 70+71

SỐ 72

SỐ 73

SỐ 74

SỐ 75

SỐ 76

SỐ 77

SỐ 78

SỐ 79

SỐ 80

SỐ 81

2013

SỐ 82+83

SỐ 84

SỐ 85

SỐ 86

SỐ 87

SỐ 88

SỐ 89

SỐ 90

SỐ 91

SỐ 92

SỐ 93

2014

SỐ 96

SỐ 94+95

SỐ 97

SỐ 98

SỐ 99

SỐ 100

SỐ 101

SỐ 102

SỐ 103

SỐ 104

SỐ 105

2015

SỐ 106+107

SỐ 108

SỐ 109

SỐ 110

SỐ 111

SỐ 112

SỐ 113

SỐ 114

SỐ 115

SỐ 116

SỐ 117

2016

SỐ 118+119

SỐ 120

SỐ 121

SỐ 122

SỐ 123

SỐ 124

SỐ 125

SỐ 126

SỐ 127

SỐ 128

SỐ 129

2017

SỐ 130&131

SỐ 132

SỐ 133

SỐ 134

SỐ 135

SỐ 136

SỐ 137

SỐ 138

SỐ 139

SỐ 140

SỐ 141

2018

SỐ 142&143

SỐ 144

SỐ 145

SỐ 146

SỐ 147

SỐ 148

SỐ 149

SỐ 150

SỐ 151

SỐ 152

SỐ 153

2019

SỐ 154&155

SỐ 156

SỐ 157

SỐ 159

SỐ 160

SỐ 161

SỐ 162

SỐ 158

SỐ 163

SỐ 164

SỐ 165

2020

Số 166+ 167

Số 168

Số 169

Số 170

Số 171

SỐ 172

Số 173

SỐ 174

SỐ 175

SỐ 177

2021

SỐ 178+179

SỐ 180

Số 181

Số 182

SỐ 183

SỐ 184

SỐ 185

SỐ 186

SỐ 187

SỐ 188

SỐ 189

2022

SỐ 190+191

SỐ 192

SỐ 193

SỐ 194

SỐ 195

SỐ 196

SỐ 197

SỐ 198

Số 199

SỐ 200

SỐ 201

2023

SỐ 202+203

SỐ 204

SỐ 205

SỐ 206

SỐ 207

SỐ 208

SỐ 209

SỐ 210

SỐ 211

SỐ 212

SỐ 213

2024

SỐ 214+215

Số 214+215_Tiếp theo

SỐ 216

SỐ 217

Số 218

Số 219

Số 221

Số 222

Số 223

Số 224

Số 225

Thông tin tạp chí

Tôn chỉ & Mục đích Hội đồng biên tập Ban biên tập Chính sách Tạp chí Lịch làm việc Lãnh đạo

Đạo đức xuất bản Thư mời số đặc biệt

ISSN

ISSN	2615-9813
ISSN (số cũ)	1859-3682

Số 221 | Tháng 8/2024

Ứng dụng học sâu và nhận diện ký tự quang học trong số hóa tài liệu báo cáo tài chính

Nguyễn Quang Học, Đặng Thiên Vũ, Trần Thị Minh Hiền, Lê Hoành Sử

25/08/2024 102 0 0

Like Mua ngay 30.000 đ

Trí tuệ nhân tạo học sâu nhận diện ký tự quang học báo cáo tài chính số hóa tài liệu.

Tóm tắt:

Trong bối cảnh của cuộc cách mạng số hóa toàn cầu, việc sử dụng phương pháp nhập liệu thủ công để số hóa bảng biểu trong báo cáo tài chính (BCTC) đã trở nên lỗi thời và không đáp ứng được nhu cầu về thời gian và chi phí trong thời đại hiện nay. Để khắc phục hạn chế này, nghiên cứu tập trung đề xuất một phương pháp tự động nhận diện bảng biểu trong các BCTC từ hình ảnh thông qua công cụ PaddleOCR. Nghiên cứu tận dụng những mô hình học sâu và công nghệ nhận diện ký tự quang học (OCR) của công cụ mã nguồn mở này để tiến hành các công đoạn phát hiện bảng, phát hiện và nhận diện văn bản, dự đoán cấu trúc bảng và tọa độ ô và cuối cùng là tái tạo lại bảng tương đồng dưới dạng file Excel, HTML. Nghiên cứu thực nghiệm và so sánh với các bảng thực tế, đạt được độ chính xác trung bình trong việc tái tạo cấu trúc bảng và nhận diện các cột nội dung quan trọng là 95% đối với dạng bảng có các đường viền đầy đủ và 83% đối với dạng bảng ít viền. Kết quả khả quan này khẳng định tính ứng dụng của công cụ cho giải pháp số hóa tài liệu chứa bảng biểu, giảm bớt thời lượng xử lí các công việc nhập liệu.

Tài liệu tham khảo:

Anagnoste, S. (2017). Robotic Automation Process - The next major revolution in terms of back office operations improvement. Proceedings of the International Conference on Business Excellence, 11(1), 676-686. https://doi.org/10.1515/picbe-2017-0072.
Chenxia Li, Weiwei Liu, Ruoyu Guo, Xiaoting Yin, Kaitao Jiang, Yongkun Du, Yuning Du, Lingfeng Zhu, Runjie Jin, Keying Liu, Yehua Yang, Ran Bi, Xiaoguang Hu, Dianhai Yu, & Yanjun Ma. (2022). Dive Into OCR.
Chi, Z., Huang, H., Xu, H. D., Yu, H., Yin, W., & Mao, X. L. (2019). Complicated table structure recognition. arXiv preprint arXiv:1908.04729.
Dieu, L. T., Nguyen, T. T., Vo, N. D., Nguyen, T. V., & Nguyen, K. (2021). Parsing Digitized Vietnamese Paper Documents (pp. 382-392). https://doi.org/10.1007/978-3-030-89128-2_37
Du, Y., Li, C., Guo, R., Yin, X., Liu, W., Zhou, J., Bai, Y., Yu, Z., Yang, Y., Dang, Q., & Wang, H. (2020). PP-OCR: A Practical Ultra Lightweight OCR System.
Kaya, C. T., Turkyilmaz, M., & Birol, B. (2019). Impact of RPA Technologies on Accounting Systems. Muhasebe ve Finansman Dergisi, 235-250. https://doi.org/10.25095/mufad.536083.
Krishna Manipatruni, J., Gnana Sree, R., Padakanti, R., Naroju, S., Kumar Depuru, B., Associate, R., & Author, C. (2023). Leveraging Artificial Intelligence for Simplified Invoice Automation: Paddle OCR-based Text Extraction from Invoices. In International Journal of Innovative Science and Research Technology (Vol. 8, Issue 9). www.ijisrt.com
Li, C., Guo, R., Zhou, J., An, M., Du, Y., Zhu, L., ... & Yu, D. (2022). Pp-structurev2: A stronger document analysis system. arXiv preprint arXiv:2210.05391.
Liao, M., Wan, Z., Yao, C., Chen, K., & Bai, X. (2020). Real-time scene text detection with differentiable binarization. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 07, pp. 11474-11481).
Ling, X., Gao, M., & Wang, D. (2020). Intelligent document processing based on RPA and machine learning. 2020 Chinese Automation Congress (CAC), 1349-1353. https://doi.org/10.1109/CAC51589.2020.9326579.
Ma, C., Lin, W., Sun, L., & Huo, Q. (2023). Robust Table Detection and Structure Recognition from Heterogeneous Document Images. Pattern Recognition, 133. https://doi.org/10.1016/j.patcog.2022.109006.
Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, & Xiang Bai (2020). Real-time Scene Text Detection with Differentiable Binarization. AAAI Conference on Artificial Intelligence, 11474-11481.
Packard, H. (n.d.). Github. Truy cập vào 6/3/2024 từ Github: https://github.com/tesseract-ocr/tesseract.
Peyrard, C., Baccouche, M., Mamalet, F., & Garcia, C. (2015). ICDAR2015 competition on Text Image Super-Resolution. 2015 13th International Conference on Document Analysis and Recognition (ICDAR), 1201-1205. https://doi.org/10.1109/ICDAR.2015.7333951.
Pham, B. Q. (n.d.). VietOCR. Truy cập vào 6/3/2024 từ Github: https://github.com/pbcquoc/vietocr.
Schreiber, S., Agne, S., Wolf, I., Dengel, A., & Ahmed, S. (2017). DeepDeSRT: Deep Learning for Detection and Structure Recognition of Tables in Document Images. Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 1, 1162-1167. https://doi.org/10.1109/ICDAR.2017.192.
Thủ tướng Chính phủ (2020). Quyết định số 749/QĐ-TTg của Thủ tướng Chính phủ: Phê duyệt "Chương trình Chuyển đổi số quốc gia đến năm 2025, định hướng đến năm 2030". Được truy lục từ chinhphu.vn: https://chinhphu.vn/default.aspx?pageid=27160&docid=200163 vào 3/2024.
Viên Thanh Nhã, Tiếp Sỹ Minh Phụng, Nguyễn Hoàng Tú, Đỗ Thị Kim Dung, & Lê Đinh Phú Cường
(2022). Xây dựng hệ thống trích xuất thông tin giấy tờ tuỳ thân từ hình ảnh cho hệ thống định danh khách hàng điện tử. Tạp chí Khoa học và công nghệ, 58(2), 54-57.
Vo-Nguyen, T. A., Nguyen, P., & Le, H. S. (2021). An Efficient Method to Extract Data from Bank Statements Based on Image-Based Table Detection. Proceedings - 2021 15th International Conference on Advanced Computing and Applications, ACOMP 2021, 186-190. https://doi.org/10.1109/ACOMP53746.2021.00033.
Vũ Trọng Sinh (2023). Ứng dụng công nghệ nhận dạng ký tự quang học cho số hóa tài liệu tại Học viện Ngân hàng. Tạp chí Khoa học và Đào tạo Ngân hàng, 252, 71-80. https://doi.org/10.59276/TCKHDT.2023.05.2533.
Xu Zhong, E. S. (2020). Image-Based Table Recognition: Data, Model, and Evaluation. European Conference on Computer Vision.
Yang, Z., Li, Z., Jiang, X., Gong, Y., Yuan, Z., Zhao, D., & Yuan, C. (2022). Focal and global knowledge distillation for detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4643-4652).
Ye, J., Qi, X., He, Y., Chen, Y., Gu, D., Gao, P., & Xiao, R. (2021). PingAn-VCGroup's solution for ICDAR 2021 competition on scientific literature parsing task B: table recognition to HTML. arXiv preprint arXiv:2105.01848.
Zhang, X., & Wen, Z. (2021). Thoughts on the development of artificial intelligence combined with RPA. Journal of Physics: Conference Series, 1883(1), 012151. https://doi.org/10.1088/1742-6596/1883/1/012151.

Application of Deep Learning and Optical Character Recognition in Digitizing Financial Statements

AI deep learning optical character recognition financial Statements digitization.

Abstract:

The digital revolution is fundamentally altering our interaction with data. Traditional methods like manual data entry for digitizing tables in financial statements are becoming obsolete, failing to meet the standards of cost efficiency and time effectiveness in reporting. To address this challenge, this paper proposes a method centered on leveraging PaddleOCR to automatically recognize tables within images extracted from financial reports. Our approach harnesses deep learning models and optical character recognition (OCR) technology embedded within this open-source tool. The process involves detecting tables, detecting and recognizing text, predicting table structures, and ultimately reconstructing them into HTML format and Excel files. Through experimentation and comparison with actual tables, our study achieves an average TEDS score of 95% for regular tables with full borders and 83% for borderless tables. These promising results underscore the tool's viability in digitizing documents containing tables, thereby streamlining data entry processes. Furthermore, this outcome marks a significant milestone toward the broader goal of complete digitization through robotic process automation (RPA).

DOI: https://doi.org/10.63065/ajeb.vn.2024.221.101386.

Số 221 | Tháng 8/2024

Ứng dụng học sâu và nhận diện ký tự quang học trong số hóa tài liệu báo cáo tài chính

Nguyễn Quang Học, Đặng Thiên Vũ, Trần Thị Minh Hiền, Lê Hoành Sử

Application of Deep Learning and Optical Character Recognition in Digitizing Financial Statements

Mối quan hệ giữa đặc điểm sinh viên, đặc điểm trường đại học và kết quả học tập của sinh viên Trường Đại học Ngân hàng Thành phố Hồ Chí Minh

Tác động của rủi ro tín dụng đến ổn định tài chính của các ngân hàng thương mại cổ phần Việt Nam

Bài viết liên quan