Xiaowu Zhang’s research while affiliated with University of Science and Technology Beijing and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (3)


Unveiling the Impact of Multimodal Features on Chinese Spelling Correction: From Analysis to Design
  • Preprint

April 2025

Xiaowu Zhang

·

Hongfei Zhao

·

Jingyi Hou

·

Zhijie Liu

The Chinese Spelling Correction (CSC) task focuses on detecting and correcting spelling errors in sentences. Current research primarily explores two approaches: traditional multimodal pre-trained models and large language models (LLMs). However, LLMs face limitations in CSC, particularly over-correction, making them suboptimal for this task. While existing studies have investigated the use of phonetic and graphemic information in multimodal CSC models, effectively leveraging these features to enhance correction performance remains a challenge. To address this, we propose the Multimodal Analysis for Character Usage (\textbf{MACU}) experiment, identifying potential improvements for multimodal correctison. Based on empirical findings, we introduce \textbf{NamBert}, a novel multimodal model for Chinese spelling correction. Experiments on benchmark datasets demonstrate NamBert's superiority over SOTA methods. We also conduct a comprehensive comparison between NamBert and LLMs, systematically evaluating their strengths and limitations in CSC. Our code and model are available at https://github.com/iioSnail/NamBert.


LMCBert: An Automatic Academic Paper Rating Model Based on Large Language Models and Contrastive Learning

March 2025

·

16 Reads

IEEE Transactions on Cybernetics

Chuanbin Liu

·

Xiaowu Zhang

·

Hongfei Zhao

·

[...]

·

The acceptance of academic papers involves a complex peer-review process that requires substantial human and material resources and is susceptible to biases. With advancements in deep learning technologies, researchers have explored automated approaches for assessing paper acceptance. Existing automated academic paper rating methods primarily rely on the full content of papers to estimate acceptance probabilities. However, these methods are often inefficient and introduce redundant or irrelevant information. Additionally, while Bert can capture general semantic representations through pretraining on large-scale corpora, its performance on the automatic academic paper rating (AAPR) task remains suboptimal due to discrepancies between its pretraining corpus and academic texts. To address these issues, this study proposes LMCBert, a model that integrates large language models (LLMs) with momentum contrastive learning (MoCo). LMCBert utilizes LLMs to extract the core semantic content of papers, reducing redundancy and improving the understanding of academic texts. Furthermore, it incorporates MoCo to optimize Bert training, enhancing the differentiation of semantic representations and improving the accuracy of paper acceptance predictions. Empirical evaluations demonstrate that LMCBert achieves effective performance on the evaluation dataset, supporting the validity of the proposed approach. The code and data used in this article are publicly available at https://github.com/iioSnail/LMCBert.


Research on Domain-Specific Chinese Spelling Correction Method Based on Plugin Extension Modules

November 2024

This paper proposes a Chinese spelling correction method based on plugin extension modules, aimed at addressing the limitations of existing models in handling domain-specific texts. Traditional Chinese spelling correction models are typically trained on general-domain datasets, resulting in poor performance when encountering specialized terminology in domain-specific texts. To address this issue, we design an extension module that learns the features of domain-specific terminology, thereby enhancing the model's correction capabilities within specific domains. This extension module can provide domain knowledge to the model without compromising its general spelling correction performance, thus improving its accuracy in specialized fields. Experimental results demonstrate that after integrating extension modules for medical, legal, and official document domains, the model's correction performance is significantly improved compared to the baseline model without any extension modules.