Optical character recognition system with natural language processing for data recovery on scanned old academic card reports
DOI:
https://doi.org/10.4025/actascitechnol.v47i1.69814Keywords:
accuracy; image processing; chat-GPT; digital records; preservation.Abstract
In the digital age, preserving and effectively retrieving historical academic records has a significant challenge, especially when these documents only exist in deteriorated physical formats. We propose an approach to recover data from scanned documents of grade records, by using image processing and Natural Language Processing (NLP) to enhance the accuracy of Optical Character Recognition (OCR) in these documents, essential for the preservation of digital records. Our three-step methodology: first, improves the quality of the scanned image; then, extracts text using OCR and NLP techniques to retrieve data from old physical grade cards; and finally, the extracted data is corrected using Chat-GPT and prepared for upload. The results are assuring, showing an impressive Character Error Rate (CER) of 2.15% and a Word Error Rate (WER) of 7.05%, demonstrating the high accuracy of the OCR system used and its ability to precisely extract text from scanned documents. These low error rates achieved, as a result to the successful implementation of pre-processing and post-processing techniques, as well as the use of an advanced OCR tool, underscore the potential of this OCR approach to effectively extract information from documents.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
DECLARATION OF ORIGINALITY AND COPYRIGHTS
I Declare that current article is original and has not been submitted for publication, in part or in whole, to any other national or international journal.
The copyrights belong exclusively to the authors. Published content is licensed under Creative Commons Attribution 4.0 (CC BY 4.0) guidelines, which allows sharing (copy and distribution of the material in any medium or format) and adaptation (remix, transform, and build upon the material) for any purpose, even commercially, under the terms of attribution.
Read this link for further information on how to use CC BY 4.0 properly.
