Jingbo Zhu

Jingbo Zhu
Dalian Polytechnic University(Dalian, China) · School of Food Science and Technology

About

162
Publications
11,383
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,412
Citations
Introduction
Skills and Expertise
Additional affiliations
January 2004 - November 2015
Dalian Polytechnic University(Dalian, China)
Position
  • Principal Investigator

Publications

Publications (162)
Preprint
Multiscale feature hierarchies have been witnessed the success in the computer vision area. This further motivates researchers to design multiscale Transformer for natural language processing, mostly based on the self-attention mechanism. For example, restricting the receptive field across heads or extracting local fine-grained features via convolu...
Article
Neural Machine Translation (NMT) systems are undesirably slow as the decoder often has to compute probability distributions over large target vocabularies. In this work, we propose a coarse-to-fine approach to reduce the complexity of the decoding process, using only the information of the weight matrix in the Softmax layer. The large target vocabu...
Article
Background Recently, the value of natural products has been extensively considered because these resources can potentially be applied to prevent and treat coronavirus pneumonia 2019 (COVID-19). However, the discovery of nature drugs is problematic because of their complex composition and active mechanisms. Methods This comprehensive study was perf...
Preprint
Full-text available
Previous work on multimodal machine translation (MMT) has focused on the way of incorporating vision features into translation but little attention is on the quality of vision models. In this work, we investigate the impact of vision models on MMT. Given the fact that Transformer is becoming popular in computer vision, we experiment with various st...
Preprint
Residual networks are an Euler discretization of solutions to Ordinary Differential Equations (ODE). This paper explores a deeper relationship between Transformer and numerical ODE methods. We first show that a residual block of layers in Transformer can be described as a higher-order solution to ODE. Inspired by this, we design a new architecture,...
Article
Recently, many efforts have been devoted to speeding up neural machine translation models. Among them, the non-autoregressive translation (NAT) model is promising because it removes the sequential dependence on the previously generated tokens and parallelizes the generation process of the entire sequence. On the other hand, the autoregressive trans...
Chapter
Poetry is a kind of literary art, which conveys emotion with aesthetic expressions. Poetry automatic generation is challenging because it is required to confirm the semantic representation (content) and metrical constraints (form). Most previous work lacks the effective use of metrical information, resulting in the generated poems may break these c...
Preprint
Full-text available
This paper describes NiuTrans neural machine translation systems of the WMT 2021 news translation tasks. We made submissions to 9 language directions, including English$\leftrightarrow$$\{$Chinese, Japanese, Russian, Icelandic$\}$ and English$\rightarrow$Hausa tasks. Our primary systems are built on several effective variants of Transformer, e.g.,...
Preprint
Full-text available
This paper describes the submissions of the NiuTrans Team to the WNGT 2020 Efficiency Shared Task. We focus on the efficient implementation of deep Transformer models \cite{wang-etal-2019-learning, li-etal-2019-niutrans} using NiuTensor (https://github.com/NiuTrans/NiuTensor), a flexible toolkit for NLP tasks. We explored the combination of deep en...
Preprint
Full-text available
This paper describes the NiuTrans system for the WMT21 translation efficiency task (http://statmt.org/wmt21/efficiency-task.html). Following last year's work, we explore various techniques to improve efficiency while maintaining translation quality. We investigate the combinations of lightweight Transformer architectures and knowledge distillation...
Preprint
Full-text available
This paper addresses the efficiency challenge of Neural Architecture Search (NAS) by formulating the task as a ranking problem. Previous methods require numerous training examples to estimate the accurate performance of architectures, although the actual goal is to find the distinction between "good" and "bad" candidates. Here we do not resort to p...
Preprint
Improving Transformer efficiency has become increasingly attractive recently. A wide range of methods has been proposed, e.g., pruning, quantization, new architectures and etc. But these methods are either sophisticated in implementation or dependent on hardware. In this paper, we show that the efficiency of Transformer can be improved by combining...
Article
In this paper, a novel molecularly imprinted polymer (MIP) for specific adsorption of steviol glycosides was designed, and the imprinting mechanism of self-assembly system between template and monomers was clearly explored. Firstly, steviol (STE) was chosen as dummy template, and the density functional theory (DFT) at B3LYP/6-31 + G (d, p) level wa...
Article
Full-text available
Recently Neural Architecture Search has drawn interest from researchers because of its ability to learn neural network architectures from data automatically. The differentiable methods are widely used because they can obtain better architectures with less computational resources. However, the method is with a mismatch between training and inference...
Preprint
This paper describes the submission of the NiuTrans end-to-end speech translation system for the IWSLT 2021 offline task, which translates from the English audio to German text directly without intermediate transcription. We use the Transformer-based model architecture and enhance it by Conformer, relative position encoding, and stacked acoustic an...
Preprint
Encoder pre-training is promising in end-to-end Speech Translation (ST), given the fact that speech-to-translation data is scarce. But ST encoders are not simple instances of Automatic Speech Recognition (ASR) or Machine Translation (MT) encoders. For example, we find ASR encoders lack the global context representation, which is necessary for trans...
Article
Ginkgolide B (GB), the diterpenoid lactone compound isolated from the extracts of Ginkgo biloba leaves, significantly improves cognitive impairment, but its potential pharmacological effect on astrocytes induced by β‑amyloid (Aβ)1‑42 remains to be elucidated. The present study aimed to investigate the protective effect and mechanism of GB on astroc...
Preprint
Full-text available
It has been found that residual networks are an Euler discretization of solutions to Ordinary Differential Equations (ODEs). In this paper, we explore a deeper relationship between Transformer and numerical methods of ODEs. We show that a residual block of layers in Transformer can be described as a higher-order solution to ODEs. This leads us to d...
Article
Natural products sourcing chemical reference substances (NPSCRSs) are widely used in the analysis industry to determine the content and quality of food and drug products. Chromatographic techniques have been traditionally used to determine the purity of chemical reference substances (CRSs). However, the uncertainty values of NPSCRSs purity are rare...
Article
Full-text available
Rapid development of high-throughput technologies has permitted the identification of an increasing number of disease-associated genes (DAGs), which are important for understanding disease initiation and developing precision therapeutics. However, DAGs often contain large amounts of redundant or false positive information, leading to difficulties i...
Article
Ethnopharmacological relevance There is substantial experimental evidence to support the view that Ginkgo biloba L. (Ginkgoaceae), a traditional Chinese medicine known to treating stroke, has a protective effect on the central nervous system and significantly improves the cognitive dysfunction caused by disease, including alzheimer disease (AD), va...
Preprint
The large attention-based encoder-decoder network (Transformer) has become prevailing recently due to its effectiveness. But the high computation complexity of its decoder raises the inefficiency issue. By examining the mathematic formulation of the decoder, we show that under some mild conditions, the architecture could be simplified by compressin...
Preprint
Full-text available
Recently, deep models have shown tremendous improvements in neural machine translation (NMT). However, systems of this kind are computationally expensive and memory intensive. In this paper, we take a natural step towards learning strong but light-weight NMT systems. We proposed a novel group-permutation based knowledge distillation approach to com...
Preprint
Full-text available
Large amounts of data has made neural machine translation (NMT) a big success in recent years. But it is still a challenge if we train these models on small-scale corpora. In this case, the way of using data appears to be more important. Here, we investigate the effective use of training data for low-resource NMT. In particular, we propose a dynami...
Preprint
Unsupervised Bilingual Dictionary Induction methods based on the initialization and the self-learning have achieved great success in similar language pairs, e.g., English-Spanish. But they still fail and have an accuracy of 0% in many distant language pairs, e.g., English-Japanese. In this work, we show that this failure results from the gap betwee...
Article
Full-text available
In this work, a new online preparative high-performance liquid chromatography was developed for the fast and efficient separation of complex chemical mixtures from natural products. This system integrates two chromatographic systems into an online automatic separation system using the technique of multiple trap columns with valve switching. The sam...
Preprint
Traditional neural machine translation is limited to the topmost encoder layer's context representation and cannot directly perceive the lower encoder layers. Existing solutions usually rely on the adjustment of network architecture, making the calculation more complicated or introducing additional structural restrictions. In this work, we propose...
Preprint
The standard neural machine translation model can only decode with the same depth configuration as training. Restricted by this feature, we have to deploy models of various sizes to maintain the same translation latency, because the hardware conditions on different terminal devices (e.g., mobile phones) may vary greatly. Such individual training le...
Preprint
Deep encoders have been proven to be effective in improving neural machine translation (NMT) systems, but training an extremely deep encoder is time consuming. Moreover, why deep models help NMT is an open question. In this paper, we investigate the behavior of a well-tuned deep Transformer system. We find that stacking layers is helpful in improvi...
Preprint
Full-text available
Knowledge distillation has been proven to be effective in model acceleration and compression. It allows a small network to learn to generalize in the same way as a large network. Recent successes in pre-training suggest the effectiveness of transferring model parameters. Inspired by this, we investigate methods of model acceleration and compression...
Preprint
8-bit integer inference, as a promising direction in reducing both the latency and storage of deep neural networks, has made great progress recently. On the other hand, previous systems still rely on 32-bit floating point for certain functions in complex models (e.g., Softmax in Transformer), and make heavy use of quantization and de-quantization....
Article
Efficient and economical separation and enrichment of high-content class compounds from complex natural plants are of great importance. This study describes a novel continuous chromatography system (CCS) with multi-zone and multi-column dynamic tandem techniques for the efficient and economical separation and enrichment of high-content class compou...
Conference Paper
8-bit integer inference, as a promising direction in reducing both the latency and storage of deep neural networks, has made great progress recently. On the other hand, previous systems still rely on 32-bit floating point for certain functions in complex models (e.g., Softmax in Transformer), and make heavy use of quantization and de-quantization....
Preprint
Full-text available
In encoder-decoder neural models, multiple encoders are in general used to represent the contextual information in addition to the individual sentence. In this paper, we investigate multi-encoder approaches in documentlevel neural machine translation (NMT). Surprisingly, we find that the context encoder does not only encode the surrounding sentence...
Preprint
Full-text available
Neural architecture search (NAS) has advanced significantly in recent years but most NAS systems restrict search to learning architectures of a recurrent or convolutional cell. In this paper, we extend the search space of NAS. In particular, we present a general approach to learn both intra-cell and inter-cell architectures (call it ESS). For a bet...
Article
Though early successes of Statistical Machine Translation (SMT) systems are attributed in part to the explicit modelling of the interaction between any two source and target units, e.g., alignment, the recent Neural Machine Translation (NMT) systems resort to the attention which partially encodes the interaction for efficiency. In this paper, we em...
Preprint
Full-text available
Neural machine translation systems require a number of stacked layers for deep models. But the prediction depends on the sentence representation of the top-most layer with no access to low-level representations. This makes it more difficult to train the model and poses a risk of information loss to prediction. In this paper, we propose a multi-laye...
Preprint
Though early successes of Statistical Machine Translation (SMT) systems are attributed in part to the explicit modelling of the interaction between any two source and target units, e.g., alignment, the recent Neural Machine Translation (NMT) systems resort to the attention which partially encodes the interaction for efficiency. In this paper, we em...
Chapter
This paper describes our system submitted for the CCMT 2019 Quality Estimation (QE) Task, including sentence-level and word-level. We propose a new method based on predictor-estimator architecture [7] in this task. For the predictor, we adopt Transformer-DLCL [17] (dynamic linear combination of previous layers) as our feature extracting models. In...
Chapter
Full-text available
Back translation refers to the method of using machine translation to automatically translate target language monolingual data into source language data, which is a commonly used data augmentation method in machine translation tasks. Previous researchers’ works on back translation only focus on rich resource languages, while ignoring the low resour...
Conference Paper
Full-text available
Recently, the Transformer machine translation system has shown strong results by stacking attention layers on both the source and target-language sides. But the inference of this model is slow due to the heavy use of dot-product attention in auto-regressive decoding. In this paper we speed up Transformer via a fast and lightweight attention model....
Preprint
Full-text available
Recently, the Transformer machine translation system has shown strong results by stacking attention layers on both the source and target-language sides. But the inference of this model is slow due to the heavy use of dot-product attention in auto-regressive decoding. In this paper we speed up Transformer via a fast and lightweight attention model....
Preprint
Word embedding is central to neural machine translation (NMT), which has attracted intensive research interest in recent years. In NMT, the source embedding plays the role of the entrance while the target embedding acts as the terminal. These layers occupy most of the model parameters for representation learning. Furthermore, they indirectly interf...
Preprint
Full-text available
Transformer is the state-of-the-art model in recent machine translation evaluations. Two strands of research are promising to improve models of this kind: the first uses wide networks (a.k.a. Transformer-Big) and has been the de facto standard for the development of the Transformer system, and the other uses deeper language representation but faces...
Chapter
Full-text available
Transformer model based on self-attention mechanism [17] has achieved state-of-the-art in recent evaluations. However, it is still unclear how much room there is for improvement of the translation system based on this model. In this paper we further explore how to build a stronger neural machine system from four aspects, including architectural imp...
Article
Recently, researchers have shown an increasing interest in incorporating linguistic knowledge into neural machine translation (NMT). To this end, previous works choose either to alter the architecture of NMT encoder to incorporate syntactic information into the translation model, or to generalize the embedding layer of the encoder to encode additio...
Article
The full understanding of the single and joint toxicity of a variety of organophosphorus (OP) pesticides is still unavailable, because of the extreme complex mechanism of action. This study established a systems-level approach based on systems toxicology to investigate OP pesticide toxicity by incorporating ADME/T properties, protein prediction, an...
Conference Paper
Word deletion (WD) errors can lead to poor comprehension of the meaning of source translated sentences in phrase-based statistical machine translation (SMT), and have a critical impact on the adequacy of the translation results generated by SMT systems. In this paper, first we classify the word deletion into two categories, wanted and unwanted word...
Conference Paper
Training neural language models (NLMs) is very time consuming and we need parallelization for system speedup. However, standard training methods have poor scalability across multiple devices (e.g., GPUs) due to the huge time cost required to transmit data for gradient sharing in the back-propagation process. In this paper we present a sampling-base...
Conference Paper
Word deletion (WD) problems have a critical impact on the adequacy of translation and can lead to poor comprehension of lexical meaning in the translation result. This paper studies how the word deletion problem can be handled in statistical machine translation (SMT) in detail. We classify this problem into desired and undesired word deletion based...
Article
Shift-reduce parsing enjoys the property of efficiency because of the use of efficient parsing algorithms like greedy/deterministic search and beam search. In addition, shift-reduce parsing is much simpler and easy to implement compared with other parsing algorithms. In this article, we explore constituent boundary information to improve the perfor...
Article
In this paper we propose an approach to modeling syntactically-motivated skeletal structure of source sentence for machine translation. This model allows for application of high-level syntactic transfer rules and low-level non-syntactic rules. It thus involves fully syntactic, non-syntactic, and partially syntactic derivations via a single grammar...