DOI

Currently, one of the urgent tasks of automating the organization’s document flow in the context of receiving a variety of documentation from a large number of counterparties is the verification and classification of scanned materials. The article presents the analysis and main characteristics of the existing methods of solving this problem. The purpose of the study is to develop a software module that allows you to classify documents with an accuracy of not less than 97 % in real time, which is relevant to the electronic document management in large and medium-sized companies. The description of the solution of the problem based on the convolutional neural Network (CNN - Convolutional Neural Network) is given. The input data for the program module is a pdf file of the scanned document, the output data is an xml file with the document class. To improve the accuracy and speed of the program, the tasks of encoding the signal for the neural network and determining its structure were solved. The stages of processing scanned documents and the architecture of the developed neural network are described. The proposed classification method allows you to classify pages with high accuracy on a small dataset. The program was tested on a dataset of 9628 pages and 22 possible classes. The accuracy was 99.1 %. The classification time of a single page without considering file reading and copying to the GPU is 2 ms on the GeForce 780TI. The total page classification time is approximately 22.3 ms.
Translated title of the contributionCLASSIFICATION OF SCANNED DOCUMENTS USING A CONVOLUTIONAL NEURAL NETWORK
Original languageRussian
Pages (from-to)45-49
Number of pages5
JournalСовременные наукоемкие технологии
Issue number6-1
DOIs
Publication statusPublished - 2021

    GRNTI

  • 28.23.00

    Level of Research Output

  • VAK List

ID: 22846309