Conference PaperPDF Available

Overview of Grammatical Error Diagnosis for Learning Chinese as a Foreign Language


Abstract and Figures

We organize a shared task on grammatical error diagnosis for learning Chinese as a Foreign Language (CFL) in the ICCE-2014 workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA). In this paper, we describe all aspects of this shared task, including task description, data preparation, evaluation metrics, and testing results. The aim is, through such evaluation campaigns, more advanced computer-assisted Chinese learning techniques will be emerged.
Content may be subject to copyright.
Liu, C.-C. et al. (Eds.) (2014). Proceedings of the 22nd International Conference on Computers in
Education. Japan: Asia-Pacific Society for Computers in Education
Overview of Grammatical Error Diagnosis for
Learning Chinese as a Foreign Language
Liang-Chih YUa,b, Lung-Hao LEE c,d* & Li-Ping CHANG e
aDepartment of Information Management, Yuen-Ze University, Taiwan
bInnovation Center for Big Data and Digital Convergence, Yuen-Ze University, Taiwan
cInformation Technology Center, National Taiwan Normal University, Taiwan
dDepartment of Computer Science and Information Engineering, National Taiwan University, Taiwan
eMandarin Training Center, National Taiwan Normal University, Taiwan
Abstract: We organize a shared task on grammatical error diagnosis for learning Chinese as a
Foreign Language (CFL) in the ICCE-2014 workshop on Natural Language Processing
Techniques for Educational Applications (NLPTEA). In this paper, we describe all aspects of
this shared task, including task description, data preparation, evaluation metrics, and testing
results. The aim is, through such evaluation campaigns, more advanced computer-assisted
Chinese learning techniques will be emerged.
Keywords: Computer-assisted language learning, shared task, Mandarin Chinese
1. Introduction
China’s growing global influence has prompted a surge of interest in learning Chinese as a foreign
language (CFL), and this trend is expected to continue. However, whereas many computer-assisted
learning tools have been developed for use by students of English as a Foreign Language (EFL),
support for CFL learners is relatively sparse, especially in terms of tools designed to automatically
detect and correct Chinese grammatical errors. For example, while Microsoft Word has integrated
robust English spelling and grammar checking functions for years, such tools for Chinese are still
quite primitive.
In contrast to the plethora of research related to EFL learning, relatively few studies have
focused on computer-assisted language learning for CFL learners. Relative position and parse
template language models have been adopted to detect Chinese errors written by US learners (Wu et
al. 2010). Machine learning models have been applied to detect word-ordering errors in Chinese
sentences from the HSK dynamic composition corpus (Yu and Chen, 2012). Ranking SVM based
model has been further explored to rank the candidates and suggest the proper corrections of word
ordering errors (Cheng et al. 2014). A penalized probabilistic First-Order Inductive Learning (pFOIL)
algorithm has been proposed for grammatical error diagnosis (Chang et al. 2012). Linguistic rule
based approach has been presented to detect grammatical errors written by CFL learners (Lee et al.
2013). A sentence judgment system has been implemented to integrate rule-based linguistic analysis
and n-gram statistical learning for detecting grammatical errors (Lee et al. 2014). SIGHAN 2013
bakeoff on Chinese spelling check evaluation focus on developing automatic checker to detect and
correct spelling errors (Wu et al. 2013). In summary, human language technologies for Chinese
learning have attracted more attentions in recent years.
In the ICCE-2014 workshop on Natural Language Processing Techniques for Educational
Applications (NLPTEA), we organize a shared task on Chinese grammatical error diagnosis that
provides an evaluation platform for developing and implementing computer-assisted learning tools.
The data sets in our task are collected from the Chinese as the Foreign Language (CFL) learners’
written essays. Given a sentence with/without one of grammatical errors, i.e., redundant word,
missing word, word disorder, and word selection, the developed system should indicate whether
contains grammatical errors and further points out which one of defined error types. The hope is that,
through such evaluation campaigns, more advanced Chinese grammatical error detecting techniques
will be emerged.
We give an overview of the shared task on grammatical error diagnosis for learning Chinese
as a foreign language. The rest of this article is organized as follows. Section 2 details the designed
task. Section 3 introduces the data sets provided in this evaluation. Section 4 proposes the evaluation
metrics. Section 5 presents the results of participants’ approaches for performance comparison.
Finally, we conclude this paper with the findings and future research direction in the Section 6.
2. Shared Task Description
The goal of this shared task is developing the computer-assisted tools to detect several kinds of
grammatical errors, that is, redundant word, missing word, word disorder, and word selection. The
input sentence contains one of defined error types. The developed tool should indicate which kind of
error type is embedded in the given sentence. If the input sentence, which is given a unique sentence
number SID, contains no grammatical errors, the tools should return “SID, Correct”. If an input
sentence contains a defined grammatical error, the output format should be “SID, error_type”. We
simplify the task that there are only one error type may be in the given sentence. Examples are shown
as follows. In example 1, the character “is a redundant word. There is a missing word in the
example 2 and its correct usage is shown in example 3. The sentence in the example 4 has word
disorder error, i.e., the word 很早should be preceded the word 起床”. The word 一個in the
example 5 is an incorrect word selection, the correct word should be “一件”.
! Example 1
Input: (sid=B2-1447-6) 希望沒有人再被食物中毒
Output: B2-1447-6, Redundant
! Example 2
Input: (sid=C1-1876-2) 對社會國家不同的影響
Output: C1-1876-2, Missing
! Example 3
Input: (sid=C1-1876-2) 對社會國家有不同的影響
Output: C1-1876-2, Correct
! Example 4
Input: (sid=A2-0775-2) 我起床很早
Output: A2-0775-2, Disorder
! Example 5
Input: (sid=B1-0110-2) 我會穿著一個黃色的襯衫
Output: B1-0110-2, Selection
3. Data Sets
Mandarin Training Center (MTC) of National Taiwan Normal University (NTNU) was founded in
1956 for teaching Chinese as a foreign language. Currently, MTC is the most renowned Chinese
language center in Taiwan, in which around 1700 CFL learners from more than 70 countries enrolled
each academic quarter. The learner corpus used in our task is collected from the computer-based
writing Test of Chinese as a Foreign Language (TOCFL). The writing test is designed according to
the six proficiency levels of the Common European Framework of Reference (CEFR). Test takers
have to complete two different tasks for each level. For example, for the A2 (Waystage level)
candidates, they will be asked to write a note and describe a story after looking at four pictures. All
candidates are asked to complete the writings on line.
We further ask the annotators to label the grammatical errors in CFL learners’ written
sentences and provide their correct usage. Our prepared data is further divided into three distinct sets.
(1) Training set: 1,506 CFLs’ writings are collected in which 5,607 grammatical errors are annotated.
Each CFL learners’ writing is represented in SGML format shown in Figure 1. The title attribute is
used to describe the topic of the writing test. There is only one grammatical error in an annotated
sentence. The error types are also indicated along with their corresponding correct usages. All
sentences in this set can be used to train the developed grammatical error detection tool. (2) Dryrun
set: Total 33 sentences are given for participants to familiarize themselves with the final testing
process. Each participant can submit several runs generated using different models with different
parameter settings. In addition to make sure the submitted results can be correctly evaluated,
participants can fine-tune their developed models in the dryrun phase. The purpose of dryrun is for
output format validation only. No matter which performance can be achieved that will not be included
in our official evaluation. (3) Test set: In total, there are 1,750 testing sentences. A half of these
instances contain no grammatical errors. Another half of testing cases includes one grammatical error
per sentence. The number of error type redundant, missing, disorder, and selection is 279, 350, 120,
and 126, respectively. The distribution is the same with our given training set. The policy of our
evaluation is an open test. In addition to our provided data sets, registered research teams can employ
any linguistic and computational resources to detect grammatical errors in the sentences.
<ESSAY title="寫給即將初次見面的筆友的一封信">
<SENTENCE id="B1-0112-1">我的計畫是十點早上在古亭捷運站</SENTENCE>
<SENTENCE id="B1-0112-2">頭會戴著藍色的帽子</SENTENCE>
<MISTAKE id="B1-0112-1">
<MISTAKE id="B1-0112-2">
Figure 1. An essay represented in SGML format.
4. Performance Metrics
Table 1 shows the confusion matrix used for performance evaluation. In the matrix, True Positive
(TP) is the number of sentences with grammatical errors that are correctly proposed by the developed
tool; False Positive (FP) is the number of sentences without grammatical errors that are incorrectly
proposed; True Negative (TN) is the number of sentences without grammatical errors that are
identified correctly; False Negative (FN) is the number of sentences with grammatical errors that are
incorrectly regarded as correct sentences.
Table 1: The confusion matrix for evaluation.
Confusion Matrix
System Result
(With grammatical errors)
(Without grammatical errors)
True Positive (TP)
False Negative (FN)
False Positive (FP)
True Negative (TN)
The criteria for judging correctness are distinguished into two levels. (1) Detection level: all
error types are regarded as incorrect. Binary classification of a testing instance, i.e., correct or
incorrect, should be completely identical with the gold standard. (2) Identification level: this level
could be considered as a multi-class categorization problem. In addition to correct instances, all error
types should be clearly identified, i.e., Redundant, Missing, Disorder, and Selection. The following
metrics are measured in both levels with the help of the confusion matrix.
! False Positive Rate (FPR) = FP / (FP+TN)
! Accuracy = (TP+TN) / (TP+FP+TN+FN)
! Precision = TP / (TP+FP)
! Recall = TP / (TP+FN)
! F1 = 2 * Precision *Recall / (Precision + Recall)
For example, give 8 testing inputs with gold standards shown as “A2-0802-4, correct”,
A2-3344-1, Selection”, B1-3419-8, Missing”, B1-3520-3, correct”, B2-1918-7, correct”,
B2-2231-4, Disorder”, C1-1744-1, Redundant”, and C1-1873-7, correct”. The system may output
the result shown as A2-0802-4, Disorder”, A2-3344-1, Redundant”, B1-3419-8, Selection”,
B1-3520-3, correct”, B2-1918-7, correct”, B2-2231-4, Disorder”, C1-1744-1, Redundant”, and
C1-1873-7, Missing. The evaluation tool will yield the following performance.
! False Positive Rate (FPR) = 0.5 (= 2 / 4).
Notes: {“A2-0802-4, Disorder”, “C1-1873-7, Missing”} /{“A2-0802-4, correct”, “B1-3520-3,
correct”, B2-1918-7, correct”, “C1-1873-7, correct”}
! Detection Accuracy = 0.75 (=6/8).
Notes: {“A2-3344-1, Redundant”,B1-3419-8, Selection”, “B1-3520-3, correct”, “B2-1918-7,
correct”, “B2-2231-4, Disorder”, “C1-1744-1, Redundant”} /{“A2-0802-4, correct”, “A2-3344-1,
Selection”, “B1-3419-8, Missing”, “B1-3520-3, correct”, “B2-1918-7, correct”, “B2-2231-4,
Disorder”, “C1-1744-1, Redundant”, C1-1873-7, correct”}
! Detection Precision = 0.67 (=4/6).
Notes: {A2-3344-1, Redundant”, B1-3419-8, Selection”, “B2-2231-4, Disorder”, “C1-1744-1,
Redundant} /{“A2-0802-4, Disorder”, “A2-3344-1, Redundant”, “B1-3419-8, Selection”,
B2-2231-4, Disorder”, “C1-1744-1, Redundant”, “C1-1873-7, Missing”}
! Detection Recall = 1 (=4/ 4).
Notes: {“A2-3344-1, Redundant”,B1-3419-8, Selection”, “B2-2231-4, Disorder”, “C1-1744-1,
Redundant”} /{“A2-3344-1, Selection”, “B1-3419-8, Missing”, B2-2231-4, Disorder”,
C1-1744-1, Redundant”}
! Detection F1= 0.8024 (=2*0.67*1/(0.67+1))
! Identification Accuracy = 0.5 (=4/8).
Notes: {“B1-3520-3, correct”, “B2-1918-7, correct”, “B2-2231-4, Disorder”,C1-1744-1,
Redundant”} /{“A2-0802-4, correct”, “A2-3344-1, Selection”, “B1-3419-8, Missing”,
B1-3520-3, correct”, “B2-1918-7, correct”, “B2-2231-4, Disorder”, “C1-1744-1, Redundant”,
C1-1873-7, correct”}
! Identification Precision = 0.33 (=2/6).
Notes: {“B2-2231-4, Disorder”, “C1-1744-1, Redundant”} /{“A2-0802-4, Disorder”,
A2-3344-1, Redundant”, “B1-3419-8, Selection”,B2-2231-4, Disorder”, “C1-1744-1,
Redundant”, “C1-1873-7, Missing”}
! Identification Recall = 0.5 (=2/ 4).
Notes: {“B2-2231-4, Disorder”, “C1-1744-1, Redundant”} /{“A2-3344-1, Selection”,
B1-3419-8, Missing”, “B2-2231-4, Disorder”, “C1-1744-1, Redundant”}
! Identification F1= 0.3976 (=2*0.33*0.5/(0.33+0.5))
5. Evaluation Results
Table 2 shows the participant teams and their testing submission statistics. Our shared task attracted
13 research teams. There are 4 teams that come from Taiwan, i.e., AS, KUAS & NTNU, NCYU, and
NTOU. 3 teams originate from China, i.e., HITSZ, PKU, and PolyU. The remaining 6 teams are
CIRU from United States of America, MU from New Zealand, SPBU from Russia, TMU from Japan,
UL from United Kingdom, and UDS from Germany. Among 13 registered teams, 6 teams submitted
their testing results. In total, we had received 13 runs in the formal testing phase.
Table 2: Result submission statistics of all participants.
Participants (Ordered by abbreviations of names)
Academia Sinica (AS)
Confucius Institute of Rutgers University (CIRU)
Harbin Institute of Technology Shenzhen Graduate School (HITSZ)
National Kaohsiung University of Applied Sciences
& National Taiwan Normal University (KUAS & NTNU)
Massey University (MU)
National Chiayi University (NCYU)
National Taiwan Ocean University (NTOU)
Peking University (PKU)
The Hong Kong Polytechnic University (PolyU)
Saint Petersburg State University (SPBU)
Tokyo Metropolitan University (TMU)
Saarland University (UDS)
University of Leeds (UL)
Table 3: Testing results of our shared task.
Detection Level
Identification Level
Table 3 shows the testing results of our shared task. In addition to achieving promising
detection effects of grammatical errors, reducing the false positive rate, which is percentage of the
correct sentences that are incorrectly reported containing grammatical errors, is also important. The
research teams, NCYU and TMU, achieved relatively low false positive rates.
Detection level evaluations are designed to detect whether a sentence contains grammatical
errors or not. Accuracy is usually adopted to evaluate the performance, but it is affected by the
distribution of testing instance. The baseline can be achieved easily by always guessing without
errors. That is accuracy of 0.5 in this evaluation. Some systems achieved slightly better than the
baseline, i.e., CIRU, KUAS&NTNU, TMU and UDS. Registered teams may send different runs that
aimed at optimizing the recall or precision rates. These phenomena guide us to adopt F1 score to
reflect the tradeoff between precision and recall. In the testing results, CIRU accomplished the best
detection effects of indicating grammatical errors, which resulted the best F1 score 0.6884. For
identification level evaluations, the systems need to identify the error types in the given sentences.
The research team came from CIRU accomplished the best correction accuracy 0.4589. Most systems
cannot effectively identify the input sentences to point out possible grammatical errors. Our testing
results indicate that the system developed by CIRU accomplished the best identification F1 0.4333.
In summary, it is a really difficult task to develop the computer-assisted learning tool for
grammatical error diagnosis, especially learning Chinese as a foreign language, since there are only
target sentences without the help of their context. We cannot find a relatively promising system
according to our testing results. In general, this research problem still has long way to go.
6. Conclusions and Future Work
This paper describes the overview of NLPTEA 2014 shared task on grammatical error diagnosis for
learning Chinese as a foreign language. We introduce the task designing ideas, data preparation
details, evaluation metrics, and the results of performance evaluation. This task also encourages
researchers to bravely propose various ideas and implementations for possible breakthrough. No
matter how well their implementations would perform, they contribute to the community by enriching
the experience that some ideas or approaches are promising or impractical, as verified in this shared
task. Their reports in the proceeding will reveal the details of these various approaches and contribute
to our knowledge about computer-assisted Chinese learning.
All data sets and their accompanying gold standards and evaluation tool are publicly available
for research purposes at We hope our provided data can
serve as a benchmark to help developing better Chinese learning tools. This shared task also motivates
us to build more language resources in the future to possibly improve the state-of-the-art techniques.
This research was supported by the Ministry of Science and Technology, under the grant MOST
102-2221-E-155-029-MY3, 103-2221-E-003-013-MY3, 103-2911-I-003-301 and the “Aim for the
Top University Project” and “Center of Learning Technology for Chinese” of National Taiwan
Normal University, sponsored by the Ministry of Education, Taiwan.
Chang, R.-Y., Wu, C.-H., & Prasetyo, P. K. (2012). Error diagnosis of Chinese sentences using inductive
learning algorithm and decomposition-based testing mechanism. ACM Transactions on Asian Language
Information Processing, 11(1), article 3.
Cheng, S.-M., Yu, C.-H., & Chen, H.-H. (2014). Chinese word ordering errors detection and correction for
non-native Chinese language learners. Proceedings of COLING’14 (pp. 279-289), Dublin, Ireland: ACL
Lee, L.-H., Chang, L.-P., Lee, K.-C., Tseng, Y.-H., & Chen, H.-H. (2013). Linguistic rules based Chinese error
detection for second language learning. Proceedings of ICCE’13 (pp. 27-29). Bali, Indonesia: Asia-Pacific
Society for Computers in Education.
Lee, L.-H., Yu, L.-C., Lee, K.-C., Tseng, Y.-H., Chang, L.-P., & Chen, H.-H. (2014). A sentence judgment
system for grammatical error detection. Proceedings of COLING’14 (pp. 23-29), Dublin, Ireland: ACL
Wu, C.-H., Liu, C.-H., Harris, M., & Yu, L.-C. (2010). Sentence correction incorporating relative position and
parse template language models. IEEE Transactions on Audio, Speech, and Language Processing, 18(6),
Wu, S.-H., Liu, C.-L., & Lee, L.-H. (2013). Chinese spelling check evaluation at SIGHAN bake-off 2013.
Proceedings of SIGHAN’13 (pp. 35-42), Nagoya, Japan: ACL Anthology.
Yu, C.-H. & Chen, H.-H. (2012). Detecting word ordering errors in Chinese sentences for learning Chinese as a
foreign language. Proceedings of COLING’12 (pp. 3003-3017), Mumbai, India: ACL Anthology.
... We build a dataset based on the benchmark of Chinese Grammatical Error Diagnosis (CGED) in years of (Lee et al., 2016Rao et al., 2017Rao et al., , 2018Rao et al., , 2020b. The task of CGED seeks to identify grammatical errors from sentences written by non-native learners of Chinese (Yu et al., 2014). It includes four kinds of errors, including insertion, replacement, redundant, and ordering. ...
Whole word masking (WWM), which masks all subwords corresponding to a word at once, makes a better English BERT model. For the Chinese language, however, there is no subword because each token is an atomic character. The meaning of a word in Chinese is different in that a word is a compositional unit consisting of multiple characters. Such difference motivates us to investigate whether WWM leads to better context understanding ability for Chinese BERT. To achieve this, we introduce two probing tasks related to grammatical error correction and ask pretrained models to revise or insert tokens in a masked language modeling manner. We construct a dataset including labels for 19,075 tokens in 10,448 sentences. We train three Chinese BERT models with standard character-level masking (CLM), WWM, and a combination of CLM and WWM, respectively. Our major findings are as follows: First, when one character needs to be inserted or replaced, the model trained with CLM performs the best. Second, when more than one character needs to be handled, WWM is the key to better performance. Finally, when being fine-tuned on sentence-level downstream tasks, models trained with different masking strategies perform comparably.
... In recent years, Chinese grammatical error diagnosis has been cited as a shared task of NLPTEA CGED. Many methods are proposed to solve this task [17] [18]. [19] proposed a BiLSTM-CRF model based on character embedding on bigram embedding. ...
This paper introduces our system at NLPTEA-2020 Task: Chinese Grammatical Error Diagnosis (CGED). CGED aims to diagnose four types of grammatical errors which are missing words (M), redundant words (R), bad word selection (S) and disordered words (W). The automatic CGED system contains two parts including error detection and error correction and our system is designed to solve the error detection problem. Our system is built on three models: 1) a BERT-based model leveraging syntactic information; 2) a BERT-based model leveraging contextual embeddings; 3) a lexicon-based graph neural network. We also design an ensemble mechanism to improve the performance of the single model. Finally, our system obtains the highest F1 scores at detection level and identification level among all teams participating in the CGED 2020 task.
Corpus linguistics has gained prominence in the study of language, both of standard and learner varieties, in the 1990, as technological advances allowed for quick and reliable analyses of large volumes of language data. Computer-aided analyses of large principled collections of authentic texts, known as language corpora, brought about new insights into the nature of language and afforded a more nuanced understanding of linguistic structure, language change, and language development. The chapter provides an overview of the key principles of corpus linguistics methods and of some frequently-used corpus instruments and procedures; it explores the potential benefits of application of corpus linguistics methods and instruments to the study of heritage languages, using the examples of few existing heritage language corpora. Overall, the chapter aims to engage the heritage language community in investing energy in the development of heritage language corpora and in making better use of the existing computational tools in the study of heritage languages.
Conference Paper
Full-text available
This study describes the construction of a TOCFL learner corpus and its usage for Chinese grammatical error diagnosis. We collected essays from the Test Of Chinese as a Foreign Language (TOCFL) and annotated grammatical errors using hierarchical tagging sets. Two kinds of error classifications were used simultaneously to tag grammatical errors. The first capital letter of each error tags denotes the coarse-grained surface differences, while the subsequent lowercase letters denote the fine-grained linguistic categories. A total of 33,835 grammatical errors in 2,837 essays and their corresponding corrections were manually annotated. We then used the Standard Generalized Markup Language to format learner texts and annotations along with learners’ accompanying metadata. Parts of the TOCFL learner corpus have been provided for shared tasks on Chinese grammatical error diagnosis. We also investigated systems participating in the shared tasks to better understand current achievements and challenges. The datasets are publicly available to facilitate further research. To our best knowledge, this is the first annotated learner corpus of traditional Chinese, and the entire learner corpus will be publicly released.
Conference Paper
Full-text available
This paper presents the NLPTEA 2020 shared task for Chinese Grammatical Error Diagnosis (CGED) which seeks to identify grammatical error types, their range of occurrence and recommended corrections within sentences written by learners of Chinese as a foreign language. We describe the task definition, data preparation, performance metrics, and evaluation results. Of the 30 teams registered for this shared task, 17 teams developed the system and submitted a total of 43 runs. System performances achieved a significant progress, reaching F1 of 91% in detection level, 40% in position level and 28% in correction level. All data sets with gold standards and scoring scripts are made publicly available to researchers.
With the rapid development of China’s economy and the rapid increase in the number of Chinese learners in recent years, Chinese fever has become a common phenomenon in the global language exchange system. However, in the absence of foreign Chinese teachers at present, the development of Chinese new technology, this document uses the new computer technology to establish a Chinese teaching platform. The system is based on speech recognition technology to help foreigners learn spoken language, manuscript recognition technology and Chinese letters. foreign students in artificial intelligence technology, simulation of Chinese foreign education and training. The establishment of this system will not only help foreign students to solve the problem of Chinese learning successfully, but also make important contributions to the learning of Chinese students. Some new technologies, such as speech recognition, will be recognized by more and more Chinese students with the development of new technologies and the renewal of the system, and will make the greatest contribution to the promotion of Chinese culture.
Full-text available
Research on artificial intelligence in learning Mandarin in Indonesia is still very rare. The research on developing a Chinese grammar checking website is aimed at helping Mandarin learners in mastering Mandarin grammar. This website can be a “learning companion” of basic-level Mandarin learners outside the classroom. This technology is still very new, so this research should be carried out immediately by researchers in Indonesia to master this technology. The method used in this research is a quantitative approach. The collection of grammar patterns that will appear on the will be selected from several Chinese textbooks. Then the pattern will become Mandarin grammar patterns and will be entered the website auto-correct grammar. This website is designed using PHP and MySQL as the database. From the results of the questionnaire, we can find out that there are 7 points of Mandarin grammar that are still difficult for early learners.
Foreigners make various grammatical errors when learning Chinese due to the negative transfer of their mother tongue, learning strategies, etc. At present, the research on grammatical errors mainly focuses on a certain word or a certain kind of errors, resulting in a lack of comprehensive understanding. In this paper, a statistical analysis on large-scale data sets of grammatical errors made by second language learners is conducted, including words with grammatical errors and their quantities. The statistical analysis gives people a more comprehensive understanding of grammatical errors and have certain guiding significance for teaching Chinese as a second language (TCSL). Because of the large proportion of grammatical errors of “的[de](of)”, the usages of “的[de](of)” are integrated into automatic recognition of Chinese grammatical errors. Experimental results show that the performance is overall improved.
Conference Paper
Full-text available
This study develops a sentence judgment system using both rule-based and n-gram statistical methods to detect grammatical errors in Chinese sentences. The rule-based method provides 142 rules developed by linguistic experts to identify potential rule violations in input sentences. The n-gram statistical method relies on the n-gram scores of both correct and incorrect training sentences to determine the correctness of the input sentences, providing learners with improved understanding of linguistic rules and n-gram frequencies.
Conference Paper
Full-text available
In this paper, we handcraft a set of linguistic rules with syntactic information to detect errors occurred in Chinese sentences written by SLL. Experimental results come the similar conclusions with well-known ALEK system used by ETS for English Learning. Our developed Chinese sentence error detection system will be helpful for Chinese self-learners.
Conference Paper
Full-text available
This paper introduces an overview of Chinese Spelling Check task at SIGHAN Bake-off 2013. We describe all aspects of the task for Chinese spelling check, consisting of task description, data preparation, performance metrics, and evaluation results. This bake-off contains two subtasks, i.e., error detection and error correction. We evaluate the systems that can automatically point out the spelling errors and provide the corresponding corrections in students’ essays, summarize the performance of all participants’ submitted results, and discuss some advanced issues. The hope is that through such evaluation campaigns, more advanced Chinese spelling check techniques will be emerged.
Full-text available
This study presents a novel approach to error diagnosis of Chinese sentences for Chinese as second language (CSL) learners. A penalized probabilistic First-Order Inductive Learning (pFOIL) algorithm is presented for error diagnosis of Chinese sentences. The pFOIL algorithm integrates inductive logic programming (ILP), First-Order Inductive Learning (FOIL), and a penalized log-likelihood function for error diagnosis. This algorithm considers the uncertain, imperfect, and conflicting characteristics of Chinese sentences to infer error types and produce human-interpretable rules for further error correction. In a pFOIL algorithm, relation pattern background knowledge and quantized t-score background knowledge are proposed to characterize a sentence and then used for likelihood estimation. The relation pattern background knowledge captures the morphological, syntactic and semantic relations among the words in a sentence. One or two kinds of the extracted relations are then integrated into a pattern to characterize a sentence. The quantized t-score values are used to characterize various relations of a sentence for quantized t-score background knowledge representation. Afterwards, a decomposition-based testing mechanism which decomposes a sentence into background knowledge set needed for each error type is proposed to infer all potential error types and causes of the sentence. With the pFOIL method, not only the error types but also the error causes and positions can be provided for CSL learners. Experimental results reveal that the pFOIL method outperforms the C4.5, maximum entropy, and Naive Bayes classifiers in error classification.
Full-text available
Sentence correction has been an important emerging issue in computer-assisted language learning. However, existing techniques based on grammar rules or statistical machine translation are still not robust enough to tackle the common errors in sentences produced by second language learners. In this paper, a relative position language model and a parse template language model are proposed to complement traditional language modeling techniques in addressing this problem. A corpus of erroneous English-Chinese language transfer sentences along with their corrected counterparts is created and manually judged by human annotators. Experimental results show that compared to a state-of-the-art phrase-based statistical machine translation system, the error correction performance of the proposed approach achieves a significant improvement using human evaluation.
Conference Paper
Automatic detection of sentence errors is an important NLP task and is valuable to assist foreign language learners. In this paper, we investigate the problem of word ordering errors in Chinese sentences and propose classifiers to detect this type of errors. Word n-gram features in Google Chinese Web 5-gram corpus and ClueWeb09 corpus, and POS features in the Chinese POStagged ClueWeb09 corpus are adopted in the classifiers. The experimental results show that integrating syntactic features, web corpus features and perturbation features are useful for word ordering error detection, and the proposed classifier achieves 71.64% accuracy in the experimental datasets.
Chinese word ordering errors detection and correction for non-native Chinese language learners
  • S.-M Cheng
  • C.-H Yu
  • H.-H Chen
Cheng, S.-M., Yu, C.-H., & Chen, H.-H. (2014). Chinese word ordering errors detection and correction for non-native Chinese language learners. Proceedings of COLING'14 (pp. 279-289), Dublin, Ireland: ACL Anthology.