April 2025
Proceedings of the AAAI Conference on Artificial Intelligence
The great challenge of handwritten mathematical expression recognition (HMER) is the complex structures of the expressions, which are directly related to the symbol spatial positions. Existing HMER methods typically employ attention mechanisms in the decoder of their models to implicitly perceive the symbol positions, or employ symbol counting and tree-based strategies to model the symbol spatial relation. However, these methods still cannot effectively capture the structural information of formulas, thus negatively impacting the symbol decoding in HMER. To deal with this problem and enhance the HMER performance, this paper proposes a novel auxiliary task, namely predicting the symbol spatial distribution map of handwritten expression images. On such basis, this paper designs a symbol spatial-aware network (SSAN) for this task, which is jointly optimized with the HMER model. Specifically, considering the similarity of the symbol spatial positions between the handwritten mathematical expression images and their corresponding printed templates, we obtain the symbol spatial distribution map by first generating printed templates from LaTeX ground-truth for handwritten formula images and then replacing the connected components of printed templates with 2D Gaussian distribution maps of the same size. Meanwhile, due to the loose alignment of the symbol spatial positions between handwritten and printed formula images, and misclassification of similar symbols, we further propose a coarse-to-fine alignment strategy and an attention-guided symbol masking strategy in SSAN to tackle these issues. Extensive experiments demonstrate that SSAN significantly improves the recognition performance of the HMER models, and the proposed auxiliary tasks are more effective in enhancing HMER performance than existing auxiliary tasks.