Conference PaperPDF Available

Chinese Grammatical Error Detection Using a CNN-LSTM Model

Authors:

Abstract and Figures

In this paper, we proposed a Convolution Neural Network with Long Short-Term Memory (CNN-LSTM) model for Chinese grammatical error detection. The TOCFL learner corpus is adopted to measure the system performance of indicating whether a sentence contains errors or not. Our model performs better than other neural network based methods in terms of accuracy for identifying an erroneous sentence written by Chinese language learners.
Content may be subject to copyright.
Chen, W. et al. (Eds.) (2017). Proceedings of the 25th International Conference on Computers in Education.
New Zealand: Asia-Pacific Society for Computers in Education
Chinese Grammatical Error Detection Using a
CNN-LSTM Model
Lung-Hao LEEa, Bo-Lin LINb,c, Liang-Chih YUb,c & Yuen-Hsien TSENGa*
aGraduate Institute of Library and Information Studies, National Taiwan Normal University, Taiwan
bDepartment of Information Management, Yuan Ze University, Taiwan
cInnovation Center for Big Data and Digital Convergence, Yuan Ze University, Taiwan
*samtseng@ntnu.edu.tw
Abstract: In this paper, we proposed a Convolution Neural Network with Long Short-Term
Memory (CNN-LSTM) model for Chinese grammatical error detection. The TOCFL learner
corpus is adopted to measure the system performance of indicating whether a sentence contains
errors or not. Our model performs better than other neural network based methods in terms of
accuracy for identifying an erroneous sentence written by Chinese language learners.
Keywords: Grammatical error diagnosis, deep neural networks, Chinese as a foreign language
1. Introduction
Chinese as foreign language learners usually make different kinds of grammatical errors during second
language acquisition process (Lee et al., 2016a). Automated grammatical error detection and correction
are emerging important research directions and a number of competitions have been organized to
encourage innovation (Leacock et al., 2014). Recently, the Natural Language Processing Techniques
for Educational Applications (NLPTEA) workshops have hosted a series of shared tasks for Chinese
grammatical error diagnosis (Yu et al., 2014; Lee et al., 2015; Lee et al., 2016b). All of these activities
attracted global participations and enhanced research developments.
Language models have been adopted to detect various types of Chinese errors written by US
learners (Wu et al., 2010). A probabilistic inductive learning algorithm has been proposed to diagnose
Chinese grammatical errors (Chang et al., 2012). Linguistic rules have been manually constructed to
detect Chinese erroneous sentences (Lee et al., 2013). Support Vector Machine based classifiers have
been used to explore useful features for detecting word-ordering errors in Chinese sentences (Yu and
Chen, 2012). A sentence judgment system has been developed to detect grammatical errors in Chinese
sentences using both n-gram statistical analysis and rule-based linguistic analysis (Lee et al., 2014).
Gated recurrent neural network models have been explored to select the best prepositions for Chinese
grammatical error diagnosis (Huang et al., 2016). In recent NLPTEA workshops (Lee et al., 2015; Lee
et al., 2016b), neural approaches have been explored for identifying Chinese grammatical errors. This
observation motivates us to explore neural networks to detect errors written by Chinese learners.
This study describes our proposed Convolutional Neural Network with Long Short-Term
Memory (CNN-LSTM) model, a kind of deep neural network, for Chinese grammatical error detection.
The TOCFL learner corpus is used to evaluate and compare performance. Error detection systems that
indicate grammatical errors in a given sentence are useful to leaners for computer-assisted language
learning.
2. Convolutional Neural Network with Long Short-Term Memory (CNN-LSTM)
Figure 1 shows our Convolutional Neural Network with Long Short-Term Memory
(CNN-LSTM) architecture for Chinese grammatical error detection. An input sentence is
represented as a sequence of words. Each word refers to a row looked up in a word embedding
matrix generating from Word2Vec (Mikolov et al., 2013). A single convolution layer is
919
adopted. We use convolutions overs the sentence matrix to extract features. The full
convolutions are obtained by sliding the filters over the whole matrix. Each filter performs the
convolution operations on the sentence matrix and generates a feature map. A pooling layer is
then used to subsample features over each map. We apply the max operation to reduce the
dimensionality for keeping the most salient features. To capture long-distance dependency
across features, LSTM is used in the sequential layer for vector composition. After the LSTM
memory cells sequentially traverse through all feature vectors, the last state of the sequential
layer is regarded as input for neural computing. The final softmax layer then receives
computing results and uses it to classify the sentence.
During the training phase, if a sentence contains at least one grammatical error judged
by a human, its class is labeled as 1 and 0 otherwise. All the sentences with their labeled classes
are used to train our CNN-LSTM model to automatically learn all the corresponding
parameters in this model.
To classify a sentence during the testing phase, the sentence goes through the
CNN-LSTM architecture to yield a value corresponding to the error probability. If the
probability of a sentence with class 1 (i.e., with errors) exceeds a predefined threshold, it is
considered as true as an erroneous sentence and false otherwise.
Figure 1. The illustration of our CNN-LSTM model for Chinese grammatical error detection.
3. Experiments and Evaluation Results
The experimental data came from the TOCFL learner corpus (Lee et al., 2016a), including grammatical
error annotation of 2,837 essays written by Chinese language learners originating from 46 different
mother-tongue languages. Each sentence in each essay is manually labeled. The result is that a total of
25,277 sentences contain at least one grammatical error, while the remaining 68,982 sentences are
grammatically correct (an unbalanced distribution with 26.82% sentences having grammatical errors).
Five-fold cross validation evaluation was used to measure the performance.
To implement the system, a python library Theano was used. For Word2Vec representation,
Chinese Wikipedia 2016 was trained to generate 300 dimensional vectors for 655,247 words and
phrases. The number of filters was 300 and their length is 3. The number of iteration (i.e., epochs) was
set up as 5 to learn the CNN-LSTM network parameters. If the error probability of an input sentence
exceeds 0.3, it was considered as an erroneous sentence.
The following three methods were compared to demonstrate their performance. (1) CNN only:
this method only considers the CNN part of our proposed model. (2) LSTM only: this approach only
focuses on the LSTM part of our proposed model (3) CNN-LSTM: this is our proposed model for
Chinese grammatical error detection.
Table 1 shows the results. The CNN only and CNN-LSTM model respectively had the best
recall and precision. Considering the tradeoff, the LSTM only model reflected the best F1-score of
0.4859 (the improvement compared to the lowest F1-score is 5.4%). In addition to best precision, our
920
proposed CNN-LSTM model also achieved the best accuracy of 0.6905 (the improvement compared to
the lowest accuracy is 12.77%).
Table 1: Evaluation on Chinese grammatical error detection.
Method
Accuracy
Precision
Recall
F1-score
CNN only
0.6123
0.3745
0.6488
0.4717
LSTM only
0.6599
0.4179
0.6049
0.4859
CNN-LSTM
0.6905
0.4439
0.5057
0.4610
4. Conclusions
This study describes the CNN-LSTM model for Chinese grammatical error detection. We use the
TOCFL learner corpus to demonstrate system performance. Our system achieved the best accuracy of
0.6905 for predicting whether a given sentence contains grammatical errors or not, which roughly
corresponds to 7 out 10 input sentences were judged correctly under the unbalanced error distribution.
Acknowledgements
This study was partially supported by the Ministry of Science and Technology, under the grant MOST
103-2221-E-003-013-MY3, MOST 105-2221-E-155-059-MY2, MOST 106-2221-E-003-030-MY2
and the “Aim for the Top University Project” and “Center of Language Technology for Chinese” of
National Taiwan Normal University, sponsored by the Ministry of Education, Taiwan, ROC.
References
Chang, R.-Y., Wu, C.-H., & Prasetyo, P. K. (2012). Error diagnosis of Chinese sentences using inductive learning
algorithm and decomposition-based testing mechanism. ACM Transactions on Asian Language Information
Processing, 11(1), Article 3.
Huang, H.-H., Shao, Y.-C., Chen, H.-H. (2016). Chinese preposition selection for grammatical error diagnosis.
Proceedings of COLING’16 (pp. 888-899). Osaka, Japan: ACL Anthology.
Leacock, C., Chodorow, M., Gamon, M., & Tetreault, J. (2014). Automated Grammatical Error Detection for
Language Learners (2nd Edition). Morgan & Claypool Publishers.
Lee, L.-H., Chang, L.-P., Lee, K.-C., Tseng, Y.-H. & Chen, H.-H. (2013). Linguistic rules based Chinese error
detection for second language learning. Proceedings of ICCE’13 (pp. 27-29), Bail, Indonesia: Asia-Pacific
Society for Computers in Education.
Lee, L.-H., Chang, L.-P., & Tseng, Y.-H. (2016a). Developing learner corpus annotation for Chinese grammatical
errors. Proceedings of IALP’16 (pp. 254-257), Tainan, Taiwan: IEEE Digital Library.
Lee, L.-H., Rao, G., Yu, L.-C., Xun, E., Zhang, B., & Chang, L.-P. (2016b). Overview of the NLP-TEA 2016
shared task for Chinese grammatical error diagnosis. Proceedings of NLPTEA’16 (pp. 40-48), Osaka, Japan:
ACL Anthology.
Lee, L.-H., Yu, L.-C., & Chang, L.-P. (2015). Overview of the NLP-TEA 2015 shared task for Chinese
grammatical error diagnosis. Proceedings of NLPTEA’15 (pp. 1-6), Beijing, China: ACL Anthology.
Lee, L.-H., Yu, L.-C., Lee, K.-C., Tseng, Y.-H., Chang, L.-P., & Chen, H.-H. (2014). A sentence judgment system
for grammatical error detection. Proceedings of COLING’14 (pp. 67-70), Dublin, Ireland: ACL Anthology
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations of words and
phrases and their compositionality. Proceedings of NIPS’13 (pp. 1-10), Stateline, Nevada.
Yu, C.-H., & Chen, H.-H. (2012). Detecting word ordering errors in Chinese sentences for learning Chinese as a
foreign language. Proceedings of COLING’12 (pp. 3003-3017), Bombay, India: ACL Anthology
Yu, L.-C., Lee, L.-H., & Chang, L.-P. (2014). Overview of grammatical error diagnosis for learning Chinese as a
foreign Language. Proceedings of NLPTEA’14 (pp. 42-47), Nara, Japan: Asia-Pacific Society for Computers
in Education.
Wu, C.-H., Liu, C.-H., Harris, M., & Yu, L.-C. (2010). Sentence correction incorporating relative position and
parse template language model. IEEE Transactions on Audio, Speech, and Language Processing, 18(6),
1170-1181.
921
... The input from the previous stage in the form of pooled feature map is further made an input to the LSTM layer, to learn the long-term dependences over features. Finally, the softmax activation function is applied at the last/output layer to perform the classification task (Lee et al. 2018). In this way, the input text is classified as intent or non-intent. ...
Article
Full-text available
The social media revolution has provided the online community an opportunity and facility to communicate their views, opinions and intentions about events, policies, services and products. The intent identification aims at detecting intents from user reviews, i.e., whether a given user review contains intention or not. The intent identification, also called intent mining, assists business organizations in identifying user's purchase intentions. The prior works have focused on using only the CNN model to perform the feature extraction without retaining the sequence correlation. Moreover, many recent studies have applied classical feature representation techniques followed by a machine learning classifier. We examine the intention review identification problem using a deep learning model with an emphasis on maintaining the sequence correlation and also to retain information for a long time span. The proposed method consists of the convolutional neural network along with long short-term memory for efficient detection of intention in a given review, i.e., whether the review is an intent vs non-intent. The experimental results depict that the performance of the proposed system is better with respect to the baseline techniques with an accuracy of 92% for Dataset1 and 94% for Dataset2. Moreover, statistical analysis also depicts the effectiveness of the proposed method with respect to the comparing methods.
... where, ( 1 ) can be considered as the distribution of the initial probability and ( | − +1 −1 ) can be regarded as a state transition probability [35]. ...
Article
The existing virtual assistants (VAs) for medical services cannot output satisfactory results on Chinese language processing (CLP). This paper attempts to design a VA that identifies the seriousness and improves the awareness of breast cancer (BC) based on inputs of Chinese texts. Our VA was developed based on the neural network called long short-term memory (LSTM), integrating two N-gram models, namely, bigram and trigram. The integrated models are critical to text-based Chinese word segmentation (CWS). The sequence-to-sequence learning was introduced to covert the CWS into a framework of sequence classification. The proposed VA was compared with several state-of-the-art methods through an experiment. The results show that our method achieved a high accuracy (94%~97%) in identifying the high-frequency characters. The research findings are helpful to the BC identification of Chinese women.
Conference Paper
Full-text available
This paper presents the NLP-TEA 2016 shared task for Chinese grammatical error diagnosis which seeks to identify grammatical error types and their range of occurrence within sentences written by learners of Chinese as foreign language. We describe the task definition, data preparation, performance metrics, and evaluation results. Of the 15 teams registered for this shared task, 9 teams developed the system and submitted a total of 36 runs. We expected this evaluation campaign could lead to the development of more advanced NLP techniques for educational applications, especially for Chinese error detection. All data sets with gold standards and scoring scripts are made publicly available to researchers.
Conference Paper
Full-text available
This paper introduces the NLP-TEA 2015 shared task for Chinese grammatical error diagnosis. We describe the task, data preparation, performance metrics, and evaluation results. The hope is that such an evaluation campaign may produce more advanced Chinese grammatical error diagnosis techniques. All data sets with gold standards and evaluation tools are publicly available for research purposes.
Conference Paper
Full-text available
In this paper, we handcraft a set of linguistic rules with syntactic information to detect errors occurred in Chinese sentences written by SLL. Experimental results come the similar conclusions with well-known ALEK system used by ETS for English Learning. Our developed Chinese sentence error detection system will be helpful for Chinese self-learners.
Article
Full-text available
This study presents a novel approach to error diagnosis of Chinese sentences for Chinese as second language (CSL) learners. A penalized probabilistic First-Order Inductive Learning (pFOIL) algorithm is presented for error diagnosis of Chinese sentences. The pFOIL algorithm integrates inductive logic programming (ILP), First-Order Inductive Learning (FOIL), and a penalized log-likelihood function for error diagnosis. This algorithm considers the uncertain, imperfect, and conflicting characteristics of Chinese sentences to infer error types and produce human-interpretable rules for further error correction. In a pFOIL algorithm, relation pattern background knowledge and quantized t-score background knowledge are proposed to characterize a sentence and then used for likelihood estimation. The relation pattern background knowledge captures the morphological, syntactic and semantic relations among the words in a sentence. One or two kinds of the extracted relations are then integrated into a pattern to characterize a sentence. The quantized t-score values are used to characterize various relations of a sentence for quantized t-score background knowledge representation. Afterwards, a decomposition-based testing mechanism which decomposes a sentence into background knowledge set needed for each error type is proposed to infer all potential error types and causes of the sentence. With the pFOIL method, not only the error types but also the error causes and positions can be provided for CSL learners. Experimental results reveal that the pFOIL method outperforms the C4.5, maximum entropy, and Naive Bayes classifiers in error classification.
Article
Full-text available
Sentence correction has been an important emerging issue in computer-assisted language learning. However, existing techniques based on grammar rules or statistical machine translation are still not robust enough to tackle the common errors in sentences produced by second language learners. In this paper, a relative position language model and a parse template language model are proposed to complement traditional language modeling techniques in addressing this problem. A corpus of erroneous English-Chinese language transfer sentences along with their corrected counterparts is created and manually judged by human annotators. Experimental results show that compared to a state-of-the-art phrase-based statistical machine translation system, the error correction performance of the proposed approach achieves a significant improvement using human evaluation.
Conference Paper
This study describes the construction of the TOCFL (Test Of Chinese as a Foreign Language) learner corpus, including the collection and grammatical error annotation of 2,837 essays written by Chinese language learners originating from a total of 46 different mother-tongue languages. We propose hierarchical tagging sets to manually annotate grammatical errors, resulting in 33,835 inappropriate usages. Our built corpus has been provided for the shared tasks on Chinese grammatical error diagnosis. These demonstrate the usability of our learner corpus annotation.
Conference Paper
Automatic detection of sentence errors is an important NLP task and is valuable to assist foreign language learners. In this paper, we investigate the problem of word ordering errors in Chinese sentences and propose classifiers to detect this type of errors. Word n-gram features in Google Chinese Web 5-gram corpus and ClueWeb09 corpus, and POS features in the Chinese POStagged ClueWeb09 corpus are adopted in the classifiers. The experimental results show that integrating syntactic features, web corpus features and perturbation features are useful for word ordering error detection, and the proposed classifier achieves 71.64% accuracy in the experimental datasets.
Book
It has been estimated that over a billion people are using or learning English as a second or foreign language, and the numbers are growing not only for English but for other languages as well. These language learners provide a burgeoning market for tools that help identify and correct learners' writing errors. Unfortunately, the errors targeted by typical commercial proofreading tools do not include those aspects of a second language that are hardest to learn. This volume describes the types of constructions English language learners find most difficult -- constructions containing prepositions, articles, and collocations. It provides an overview of the automated approaches that have been developed to identify and correct these and other classes of learner errors in a number of languages. Error annotation and system evaluation are particularly important topics in grammatical error detection because there are no commonly accepted standards. Chapters in the book describe the options available to researchers, recommend best practices for reporting results, and present annotation and evaluation schemes. The final chapters explore recent innovative work that opens new directions for research. It is the authors' hope that this volume will contribute to the growing interest in grammatical error detection by encouraging researchers to take a closer look at the field and its many challenging problems. Table of Contents: Introduction / History of Automated Grammatical Error Detection / Special Problems of Language Learners / Language Learner Data / Evaluating Error Detection Systems / Article and Preposition Errors / Collocation Errors / Different Approaches for Different Errors / Annotating Learner Errors / New Directions / Conclusion
Chinese preposition selection for grammatical error diagnosis
  • H.-H Huang
  • Y.-C Shao
  • H.-H Chen
Huang, H.-H., Shao, Y.-C., Chen, H.-H. (2016). Chinese preposition selection for grammatical error diagnosis. Proceedings of COLING'16 (pp. 888-899). Osaka, Japan: ACL Anthology.
A sentence judgment system for grammatical error detection Distributed representations of words and phrases and their compositionality
  • L.-H Lee
  • L.-C Yu
  • K.-C Lee
  • Y.-H Tseng
  • L.-P Chang
  • H.-H T Chen
  • I Sutskever
  • K Chen
  • G Corrado
  • J Dean
Lee, L.-H., Yu, L.-C., Lee, K.-C., Tseng, Y.-H., Chang, L.-P., & Chen, H.-H. (2014). A sentence judgment system for grammatical error detection. Proceedings of COLING'14 (pp. 67-70), Dublin, Ireland: ACL Anthology Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Proceedings of NIPS'13 (pp. 1-10), Stateline, Nevada.