December 2024
·
1 Read
Reporter of the Priazovskyi State Technical University Section Technical sciences
The paper analyzed the most effective existing methods of optical character recognition that use deep learning neural networks in their structure. The analysis revealed that modern neural network architectures with the best recognition accuracy indicators have a constant accuracy limit. It was also found that each analyzed neural network architecture contains a multilayer perceptron in its structure. To optimize the recognition performance of neural networks, it was proposed to use the Kolmogorov-Arnold network as an alternative to multilayer perceptron based networks. The architecture of the created model is based on a two-component transformer, the first component is a visual transformer used as an encoder, the second is a language transformer used as a decoder. The Kolmogorov-Arnold network replaces the feedforward network based on a multilayer perceptron, in each transformer – encoder and decoder. Improvement of existing neural network results is ensured through transfer learning, for which group rational functions are used as the main learning elements of the Kolmogorov-Arnold network. The model was trained on sets of images of text lines from three different writing systems: alphabetic, abugida and logographic; which are represented by the scripts: English, Devanagari and Chinese. As a result of experimental studies, high character recognition rates were found for the Chinese and Devanagari data sets but low for the English script, for the model with the Kolmogorov-Arnold network. The obtained results indicate new possibilities for increasing the reliability and efficiency of modern handwriting recognition systems