Yujie Zhang

Yujie Zhang
Nanjing University | NJU · Department of Physics

About

92
Publications
6,428
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,351
Citations

Publications

Publications (92)
Article
Background and Aims The pathological characteristics of lymphocyte infiltration in the hepatic portal tracts of patients with primary biliary cholangitis (PBC) remain unclear. Tertiary lymphoid structures (TLSs) are ectopic lymphoid tissues associated with the exacerbation of autoimmune reactions. Here, we evaluate the role of TLSs in PBC and inves...
Article
Multi-domain neural machine translation aims to construct a unified NMT model to translate sentences across various domains. Nevertheless, previous studies have one limitation is the incapacity to acquire both domain-general and specific representations concurrently. To this end, we propose an ensemble strategy with gradient conflict for multi-doma...
Article
Due to the strong reliance on domain-specific knowledge, the joint learning manner of domain discrimination and translation has been widely considered in the Multi-Domain Neural Machine Translation (MDNMT) task. However, the word ambiguity problem still inevitably exists in MDNMT, especially when mixed multi-domain data is brought into the model tr...
Article
Translation of long and complex sentence has always been a challenge for machine translation. In recent years, neural machine translation (NMT) has achieved substantial progress in modeling the semantic connection between words in a sentence, but it is still insufficient in capturing discourse structure information between clauses within complex se...
Article
As the neural-based Seq2Seq model pushes the state-of-the-art in text generation, recent work has turned to controlling attributes of the text such models generate, where syntax-controlled text generation can be applied for the paraphrase generation task, i.e., given an input sentence and a syntactic control to generate a paraphrase. The main chall...
Article
Background: Gut microbiota dysbiosis is closely related to the progression of colorectal cancer. Our previous study revealed that early life colonisation with Lactobacillus rhamnosus GG (LGG) had long-term positive effects on health. We sought to investigate whether early life LGG colonisation could inhibit intestinal tumour formation in offspring...
Chapter
Syntactically controlled paraphrase generation can produce diverse paraphrases by exposing syntactic control, where both semantic preservation and syntactic variations are two important factors. Previous works mainly focus on using fine-grained syntactic structures (e.g., full parse tree) as syntactic control. While these methods can achieve excell...
Chapter
Deep neural networks have achieved state-of-the-art performances on named entity recognition (NER) with sufficient training data, while they perform poorly in low-resource scenarios due to data scarcity. To solve this problem, we propose a novel data augmentation method based on pre-trained language model (PLM) and curriculum learning strategy. Con...
Chapter
Entity Linking (EL) refers to the task of linking entity mentions in the text to the correct entities in the Knowledge Base (KB) in which entity embeddings play a vital and challenging role because of the subtle differences between entities. However, existing pre-trained entity embeddings only learn the underlying semantic information in texts, yet...
Chapter
The predominant approach of visual question answering (VQA) relies on encoding the image and question with a “black box" neural encoder and decoding a single token into answers such as “yes” or “no”. Despite this approach’s strong quantitative results, it struggles to come up with human-readable forms of justification for the prediction process. To...
Conference Paper
Stylized neural machine translation (NMT) aims to translate sentences of one style into sentences of another style, which is essential for the application of machine translation in a real-world scenario. However, a major challenge in this task is the scarcity of high-quality parallel data which is stylized paired. To address this problem, we propos...
Conference Paper
With task-oriented dialogue systems being widely applied in everyday life, slot filling, the essential component of task-oriented dialogue systems, is required to be quickly adapted to new domains that contain domain-specific slots with few or no training data. Previous methods for slot filling usually adopt sequence labeling framework, which, howe...
Conference Paper
Learning to order events at discourse-level is a crucial text understanding task. Despite many efforts for this task, the current state-of-the-art methods rely heavily on manually designed features, which are costly to produce and are often specific to tasks/domains/datasets. In this paper, we propose a new graph perspective on the task, which does...
Article
Full-text available
The goal of sentence matching is to determine the semantic relation between two sentences, which is the basis of many downstream tasks in natural language processing, such as question answering and information retrieval. Recent studies using attention mechanism to align the elements of two sentences have shown promising results in capturing semanti...
Article
Full-text available
The gut–liver axis has been increasingly recognized as a major autoimmunity modulator. However, the implications of intestinal barrier in the pathogenesis of autoimmune hepatitis (AIH) remain elusive. Here, we investigated the functional role of gut barrier and intestinal microbiota for hepatic innate immune response in AIH patients and murine mode...
Chapter
In Chinese dependency parsing, the joint model of word segmentation, POS tagging and dependency parsing has become the mainstream framework because it can eliminate error propagation and share knowledge, where the transition-based model with feature templates maintains the best performance. Recently, the graph-based joint model [19] on word segment...
Conference Paper
Paraphrase generation is of great importance to many downstream tasks in natural language processing. Recent efforts have focused on generating paraphrases in specific syntactic forms, which, generally, heavily relies on manually annotated paraphrase data that is not easily available for many languages and domains. In this paper, we propose a novel...
Chapter
Distant supervision is an effective method to generate large-scale labeled data for relation extraction without expensive manual annotation, but it inevitably suffers from the wrong labeling problem, which would make the corpus much noisy. However, the existing research work mainly focuses on sentence-level noise filtering, without considering nois...
Chapter
Natural language inference (NLI) aims to predict whether a premise sentence can infer another hypothesis sentence. Models based on tree structures have shown promising results on this task, but the performance still falls below that of sequential models. In this paper, we present a syntax-aware attention model for NLI, by which phrase-level matchin...
Chapter
This paper proposes a novel transition-based algorithm for character-level Chinese dependency parsing that straightforwardly models the dependency tree in a top-down manner. Based on the stack-pointer parser, we joint Chinese word segmentation, part-of-speech tagging, and dependency parsing in a new way. We recursively build the character-based dep...
Chapter
Character-based Chinese dependency parsing jointly learns Chinese word segmentation, POS tagging and dependency parsing to avoid the error propagation problem of pipeline models. Recent works on this task only rely on a local status for prediction at each step, which is insufficient for guiding global better decisions. In this paper, we first prese...
Article
Full-text available
Aim Patients with small serrated adenomas (SAs) (<10 mm) often undergo surveillance colonoscopy before the routine recommended time. We aimed to determine the appropriate surveillance intervals following polypectomy of small SAs for symptomatic patients. Method We retrospectively reviewed the data of 638 patients, including 122 cases and 516 contr...
Article
Full-text available
The gut microbiota and the bile acid pool play pivotal roles in maintaining intestinal homeostasis. Bile acids are produced in the liver from cholesterol and metabolized in the intestine by the gut microbiota. Gut dysbiosis has been reported to be associated with colorectal cancer. However, the interplay between bile acid metabolism and the gut mic...
Chapter
Neural machine translation (NMT) has shown promising progress in recent years. However, for reducing the computational complexity, NMT typically needs to limit its vocabulary scale to a fixed or relatively acceptable size, which leads to the problem of rare word and out-of-vocabulary (OOV). In this paper, we present that the semantic concept inform...
Article
Full-text available
Background: Accumulating evidence shows that high fat diet is closely associated with inflammatory bowel disease. However, the effects and underlying mechanisms of maternal high fat diet (MHFD) on the susceptibility of offspring to colitis in adulthood lacks confirmation. Methods: C57BL/6 pregnant mice were given either a high fat (60 E% fat, MHFD...
Data
Maternal high fat diet decreased the MUC2 mRNA expression in 3-week old offspring mice. Total RNA was extracted from the colonic tissues of 3-week old offspring mice for real-time PCR analysis. The relative expression of MUC2 mRNA was shown. n = 6 in each group. MHFD, maternal high fat diet. MCD, maternal control diet. **p < 0.01.
Article
Full-text available
Supervised word segmentation heavily relies on large-scale and high quality labeled data. However, building such a corpus is difficult, especially with respect to domain specific data. In this paper, we propose a novel semi-supervised Chinese word segmentation (CWS) method. Specifically, we seek to select more useful sample sentences from the large...
Article
Full-text available
High-fat diet, which leads to an increased level of deoxycholic acid (DCA) in the intestine, is a major environmental factor in the development of colorectal cancer (CRC). However, evidence relating to bile acids and intestinal tumorigenesis remains unclear. In this study, we investigated the effects of DCA on the intestinal mucosal barrier and its...
Chapter
Almost all the state-of-the-art methods for Character-based Chinese dependency parsing ignore the complete dependency subtree information built during the parsing process, which is crucial for parsing the rest part of the sentence. In this paper, we introduce a novel neural network architecture to capture dependency subtree feature. We extend and i...
Article
Nonalcoholic fatty liver disease (NAFLD), as a common chronic liver disorder, is prevalent in the world. Recent evidence demonstrates that the “gut-liver axis” is related well to the progression of NAFLD, which regards gut microbiota and intestinal barrier as two critical factors correlated with NAFLD. Diammonium glycyrrhizinate (DG), a compound of...
Article
Full-text available
High fat diet is implicated in the elevated deoxycholic acid (DCA) in the intestine and correlated with increased colon cancer risk. However, the potential mechanisms of intestinal carcinogenesis by DCA remain unclarified. Here, we investigated the carcinogenic effects and mechanisms of DCA using the intestinal tumour cells and Apcmin/+ mice model....
Article
Full-text available
It is increasingly perceived that dietary components have been linked with the prevention of intestinal cancer. Cranberry is a rich source of phenolic constituents and non-digestible fermentable dietary fiber, which shows anti-proliferation effect in colorectal cancer cells. Herein, we investigated the efficacy of long-term cranberry diet on intest...
Article
Named Entity Translation Equivalents extraction plays a critical role in machine translation (MT) and cross language information retrieval (CLIR). Traditional methods are often based on large-scale parallel or comparable corpora. However, the applicability of these studies is constrained, mainly because of the scarcity of parallel corpora of the re...
Article
Full-text available
Gastric adenomyoma (GA) is a kind of rare gastric submucosal eminence lesions. As the malignant transformation cannot be ruled out, surgery and laparoscopic resection are usually considered. The aim of this study is to evaluate the therapeutic effect and safety of endoscopic submucosal dissection (ESD) for GA. All of the patients with gastric submu...
Article
The gut microbiota plays an important role in maintaining intestinal homeostasis. Dysbiosis is associated with intestinal tumorigenesis. Deoxycholic acid (DCA), a secondary bile acid increased by a western diet, is associated with intestinal carcinogenesis. However, evidence relating bile acids, intestinal microbiota and tumorigenesis is limited. I...
Conference Paper
Since Chinese dependency parsing is lack of a large amount of manually annotated dependency treebank. Some unsupervised methods of using large-scale unannotated data are proposed and inevitably introduce too much noise from automatic annotation. In order to solve this problem, this paper proposes an approach of iteratively integrating unsupervised...
Conference Paper
Conventional “pivot-based” approach of acquiring paraphrasing from bilingual corpus has limitations, where only paraphrases within two steps were considered. We propose a graph based model of acquiring paraphrases from phrases translation table. This paper describes the way of constructing graph model from phrases translation table, a random walk a...
Article
Recent work on joint word segmentation, POS (Part Of Speech) tagging, and dependency parsing in Chinese has two key problems: the first is that word segmentation based on character and dependency parsing based on word were not combined well in the transition-based framework, and the second is that the joint model suffers from the insufficiency of a...
Conference Paper
This paper presents a novel approach to enhance hierarchical phrase-based (HPB) machine translation systems with case frame (CF).we integrate the Japanese shallow CF into both rule extraction and decoding. All of these rules are then employed to decode new sentences in Japanese with source language case frame. The results of experiments carried out...
Conference Paper
This paper presents our system (BJTU-NLP system) for the NEWS2015 evaluation task of Chinese-to-English and English-to-Chinese named entity transliteration. Our system adopts a hybrid machine transliteration approach, which combines several features. To further improve the result, we adopt external data extracted from wikipeda to expand the trainin...
Article
Colorectal cancer is one of the leading causes of cancer deaths. It correlates to a high fat diet, which causes an increase of the secondary bile acids including deoxycholate (DOC) in the intestine. We aimed to determine the effects of DOC on intestinal carcinogenesis in Apc min/+ mice, a model of spontaneous intestinal adenomas. Four-week old Apc...
Article
Full-text available
Berberine, an isoquinoline alkaloid, has shown inhibitory effects on growth of several tumor cell lines in vitro. The aim of this study was to investigate chemopreventive effects of berberine on intestinal tumor development in Apcmin/+ mice. Four-week old Apcmin/+ mice were treated with 0.05% or 0.1% berberine in drinking water for twelve weeks. Th...
Article
Full-text available
Name origin recognition is to identify the origin of a name. In natural language processing, information of name origin is an important feature for name entity translation and question answering. Language identification of the origins of names can help to know what language-specific transliteration approaches to use. While some early work used two...
Article
The traditional method of Named Entity Translation Equivalents extraction is often based on large-scale parallel or comparable corpora. But the practicability of the research results is constrained by the relatively scarce of the bilingual corpus resources. We combined the features of Chinese and Japanese, and proposed a method to automatically ext...
Article
This paper proposes a method to improve the ac-curacy of bilingual texts (bitexts) dependency parsing by using an auto-generated bilingual treebank created with the help of statistical machine translation (SMT) systems. Previous bitext parsing methods use human-annotated bilingual treebanks that are costly and troublesome to obtain. In the proposed...
Chapter
A key problem in Chinese Word Segmentation is that the performance of a system will decrease when applied to a different domain. We propose an approach in which n-gram features from large raw corpus are explored to realize domain adaptation for Chinese Word Segmentation. The n-gram features include n-gram frequency feature and AV feature. We used t...
Conference Paper
Full-text available
This paper presents a simple yet effective semi-supervised method to improve Chi-nese word segmentation and POS tagging. We introduce novel features derived from large auto-analyzed data to enhance a sim-ple pipelined system. The auto-analyzed data are generated from unlabeled data by using a baseline system. We evaluate the usefulness of our appro...
Conference Paper
We propose a method to improve the accuracy of parsing bilingual texts (bitexts) with the help of statistical machine translation (SMT) systems. Previous bitext parsing methods use human-annotated bilingual treebanks that are hard to obtain. Instead, our approach uses an auto-generated bilingual treebank to produce bilingual constraints. However, b...
Article
Dependency parsing has become increasingly popular for a surge of interest lately for applications such as machine translation and question answering. Currently, several supervised learning methods can be used for training high-performance dependency parsers if sufficient labeled data are available. However, currently used statistical dependency pa...
Conference Paper
Parallel corpora are critical resources for machine translation research and development since parallel corpora contain translation equivalences of various granularities. Manual annotation of word alignment is of significance to provide gold-standard for developing and evaluating both example-based machine translation model and statistical machine...
Conference Paper
The quality of a sentence translated by a machine translation (MT) system is dif- ficult to evaluate. We propose a method for automatically evaluating the quality of each translation. In general, when translating a given sentence, one or more conditions should be satisfied to maintain a high translation quality. In English- Japanese translation, fo...
Article
This paper presents an effective dependency parsing approach of incorporating short de-pendency information from unlabeled data. The unlabeled data is automatically parsed by a deterministic dependency parser, which can provide relatively high performance for short dependencies between words. We then train another parser which uses the informa-tion...
Article
We launched a 5-year-project in 2006 to develop a Japanese-Chinese machine translation system for translating scientific and technical papers. As part of that project, we are currently building a Japanese-Chinese translation dictionary based on the EDR Japanese-English bilingual dictionary. This paper presents the design and construction of the Jap...
Conference Paper
This paper presents a practical tri-training method for Chinese chunk- ing using a small amount of labeled training data and a much larger pool of unla- beled data. We propose a novel selection method for tri-training learning in which newly labeled sentences are selected by comparing the agreements of three clas- sifiers. In detail, in each iterat...
Conference Paper
This paper presents our work on acquiring translational equivalence from a Japanese-Chinese parallel corpus. We follow and extend existing word alignment techniques, including statistical model and heuristic model, in order to achieve a high performance. In addition to the statistics of the parallel corpus, the lexical knowledge of the language pai...
Article
We present a Chinese Named Entity Recognition (NER) system submitted to the close track of Sighan Bakeoff2006. We define some additional features via do-ing statistics in training corpus. Our sys-tem incorporates basic features and addi-tional features based on Conditional Ran-dom Fields (CRFs). In order to correct in-consistently results, we perfo...
Conference Paper
In this paper, we describe an empirical study of Chinese chunking on a corpus, which is extracted from UPENN Chinese Treebank-4 (CTB4). First, we compare the performance of the state-of-the-art ma- chine learning models. Then we propose two approaches in order to improve the performance of Chinese chunking. 1) We propose an approach to resolve the...
Article
One of the key issues in spoken-language translation is how to deal with unrestricted expressions in spontaneous utterances. We have developed a paraphraser for use as part of a translation system, and in this paper we describe the implementation of a Chinese paraphraser for a Chinese-Japanese spoken-language translation system. When an input sente...
Article
We present a method for constructing a Japanese-Chinese bilingual dictionary from a Japanese-English dictionary and an English-Chinese dictionary using English as an intermediate language. To select correct translations from among a large number of candidates, we have developed a method of ranking candidate translations by utilizing three sources o...
Article
We are constricting a Japanese-Chinese parallel corpus, which is a part of the NICT Multilingual Corpora. The corpus is general domain, of large scale of about 40,000 sentence pairs, long sentences, annotated with detailed information and high quality. To the best of our knowledge, this will be the first annotated Japanese-Chinese parallel corpus i...
Article
This paper presents a method involving self-organizing monolingual semantic maps that are visible and continuous representations where Chinese or Japanese words with similar meanings are placed at the same or neighboring points so that the distance between them represents the semantic similarity. We used the self-organizing map, SOM, as a self-orga...
Article
This paper describes Japanese-English-Chinese aligned parallel treebank corpora of newspaper articles. They have been constructed by trans-lating each sentence in the Penn Treebank and the Kyoto University text corpus into a cor-responding natural sentence in a target lan-guage. Each sentence is translated so as to reflect its contextual informatio...
Article
This paper addresses the problem of compound word translation and proposes the approaches to acquiring translations. The proposed approaches focus on exploring web data and utilizing English translations to link words of the source language and the correspondences in the target language. The paper uses Japanese-Chinese language pairs for the sake o...
Article
Effective self-organizing techniques for constructing monolingual semantic maps of Japanese and Chinese have already been developed. By extending the monolingual map to a bilingual semantic map, we have proposed a semantics-based approach for word alignment in a Japanese/Chinese bilingual corpus.
Conference Paper
Full-text available
One of the key issues in spoken language translation is how to deal with unrestricted expressions in spontaneous utterances. This research is centered on the development of a Chinese paraphraser that automatically paraphrases utterances prior to transfer in Chinese-Japanese spoken language translation. In this paper, a pattern-based approach to par...
Article
Full-text available
This paper presents an approach to spoken Chinese language paraphrasing basedonfeatureextractionand techniques of language generation. In this approach, an input utterance is first analyzed in terms of phrase structure, dependency of chunks, etc., by using multiple methods. Then, the main features of the input utterance are extracted, and the extra...
Article
Full-text available
This paper proposes a new machine translation design that is the core architecture in an on-going. Although GIST is conceptually natural, an English paraphraser was constructed to generate natural language interpretations
Article
This paper addresses the problem of compound word translation and proposes the approaches to acquiring translations. The proposed approaches focus on exploring web data and utilizing English translations to link words of the source language and the correspondences in the target language. The paper uses Japanese-Chinese language pairs for the sake o...
Article
Full-text available
In this paper, we propose a paraphrasing approach to spoken language processing and introduce our preliminary investigation on phenomena of the Chinese spoken language. In spoken language processing, many problems have still not been resolved satisfactorily, such as ungrammatical expressions due to spontaneous utterances and speech recognition erro...
Article
There is a large set of details in Collins' model which has been proved to be quite important in English parsing. When we apply the model to parse other languages such as Chinese, it is very important to know whether these details will still work. We present a careful assessment of the applicability of a number of detail fea-tures in Collins' model...
Article
Full-text available
Automatic word alignment is an important technology for extracting translation knowledge from parallel corpora. However, automatic techniques cannot resolve this problem completely because of variances in translations. We therefore need to investigate the performance potential of automatic word alignment and then decide how to suitably apply it. In...
Article
How to deal with part of speech (POS) tagging is a very important problem when we build a syntactic parsing system. We could preprocess the text with a POS tagger before perform parsing in a pipelined approach. Alterna- tively, we could perform POS tagging and parsing simultaneously in an inte- grated approach. Few, if any, comparisons have been ma...
Article
This paper presents the overview of statistical machine translation systems that BJTU-NLP developed for the NTCIR-9 Patent Machine Translation Task (NTCIR-9 PatentMT). We compared the performance between phrase-based translation model and factored translation model in our Patent SMT of Chinese to English and English to Japanese. Factored translatio...

Network

Cited By