About
11
Publications
1,299
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
128
Citations
Introduction
Skills and Expertise
Current institution
Publications
Publications (11)
Recently, researchers have shown an increasing interest in incorporating linguistic knowledge into neural machine translation (NMT). To this end, previous works choose either to alter the architecture of NMT encoder to incorporate syntactic information into the translation model, or to generalize the embedding layer of the encoder to encode additio...
We describe novel approaches to tackling the problem of natural language processing for low-resource languages. The approaches are embodied in systems for name tagging and machine translation (MT) that we constructed to participate in the NIST LoReHLT evaluation in 2016. Our methods include universal tools, rapid resource and knowledge acquisition,...
Word deletion (WD) errors can lead to poor comprehension of the meaning of source translated sentences in phrase-based statistical machine translation (SMT), and have a critical impact on the adequacy of the translation results generated by SMT systems. In this paper, first we classify the word deletion into two categories, wanted and unwanted word...
In this paper, we use English as the pivot language to build statistical machine translation systems as parallel training corpora for foreign languages and Chinese are non-existent. We classify the pivot language based methods into system-level, corpus-level, and phrase-level methods. For the proposed improved corpus-level method, we improve the tr...
Word deletion (WD) problems have a critical impact on the adequacy of translation and can lead to poor comprehension of lexical meaning in the translation result. This paper studies how the word deletion problem can be handled in statistical machine translation (SMT) in detail. We classify this problem into desired and undesired word deletion based...
Due to the sparsity of data and the limitation of bilingual data size, many high-quality phrase pairs can't be generated. The example-based phrase pairs proposed by the authors are generated through decomposing, substituting and generating the typical phrase pairs, and the typical phrase pairs are generated by the typical phrase extraction method i...
Most statistical machine translation systems typically rely on word alignments to extract translation rules. This approach would suffer from a practical problem that even one spurious word alignment link can prevent some desirable translation rules from being extracted. To address this issue, this paper presents two approaches, referred to as sub-t...
We present a new open source toolkit for phrase-based and syntax-based machine translation. The toolkit supports several state-of-the-art models developed in statistical machine translation, including the phrase-based model, the hierachical phrase-based model, and various syntax-based models. The key innovation provided by the toolkit is that the d...
This paper describes the NiuTrans system developed by the Natural Language Processing Lab at Northeastern University for the Patent Machine Translation Task at NTCIR-9. We present our submissions to the nine tracks of CWMT2011, and show several improvements to our core phrase-based and syntax-based engines, including: an approach to improving searc...
This paper describes the NiuTrans system developed by the Natural Language Processing Lab at Northeastern University for the NTCIR-9 Patent Machine Translation task (NTCIR-9 PatentMT). We present our submissions to the two tracks of NTCIR-9 PatentMT, and show several improvements to our phrase-based Statistical MT engine, including: a hybrid reorde...