Boliang Zhang’s research while affiliated with Rensselaer Polytechnic Institute and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (25)


Figure 2: MeetDot room creation. Landing page (top, left panel) Any user can set up a MeetDot room and share its URL with potential participants (bottom, left panel). Admin users can select parameters that control captioning, speech recognition, and translation (right panel, §3).
Figure 3: MeetDot videoconference interface. Translated captions are incrementally updated (word-by-word, phrase-by-phrase) on top of participant videos. Translations also appear in the transcript panel (on right, not shown), updated utterance-by-utterance. Choosing a caption language (4th button from left at the bottom, in green) displays all captions in that particular language. This depicts the view of the English caption user.
MeetDot: Videoconferencing with Live Translation Captions
  • Preprint
  • File available

September 2021

·

397 Reads

Arkady Arkhangorodsky

·

Christopher Chu

·

Scot Fang

·

[...]

·

Kevin Knight

We present MeetDot, a videoconferencing system with live translation captions overlaid on screen. The system aims to facilitate conversation between people who speak different languages, thereby reducing communication barriers between multilingual participants. Currently, our system supports speech and captions in 4 languages and combines automatic speech recognition (ASR) and machine translation (MT) in a cascade. We use the re-translation strategy to translate the streamed speech, resulting in caption flicker. Additionally, our system has very strict latency requirements to have acceptable call quality. We implement several features to enhance user experience and reduce their cognitive load, such as smooth scrolling captions and reducing caption flicker. The modular architecture allows us to integrate different ASR and MT services in our backend. Our system provides an integrated evaluation suite to optimize key intrinsic evaluation metrics such as accuracy, latency and erasure. Finally, we present an innovative cross-lingual word-guessing game as an extrinsic evaluation metric to measure end-to-end system performance. We plan to make our system open-source for research purposes.

Download

Hyper-parameters
A Hybrid Task-Oriented Dialog System with Domain and Task Adaptive Pretraining

February 2021

·

121 Reads

This paper describes our submission for the End-to-end Multi-domain Task Completion Dialog shared task at the 9th Dialog System Technology Challenge (DSTC-9). Participants in the shared task build an end-to-end task completion dialog system which is evaluated by human evaluation and a user simulator based automatic evaluation. Different from traditional pipelined approaches where modules are optimized individually and suffer from cascading failure, we propose an end-to-end dialog system that 1) uses Generative Pretraining 2 (GPT-2) as the backbone to jointly solve Natural Language Understanding, Dialog State Tracking, and Natural Language Generation tasks, 2) adopts Domain and Task Adaptive Pretraining to tailor GPT-2 to the dialog domain before finetuning, 3) utilizes heuristic pre/post-processing rules that greatly simplify the prediction tasks and improve generalizability, and 4) equips a fault tolerance module to correct errors and inappropriate responses. Our proposed method significantly outperforms baselines and ties for first place in the official evaluation. We make our source code publicly available.



Global Attention for Name Tagging

October 2020

·

13 Reads

Many name tagging approaches use local contextual information with much success, but fail when the local context is ambiguous or limited. We present a new framework to improve name tagging by utilizing local, document-level, and corpus-level contextual information. We retrieve document-level context from other sentences within the same document and corpus-level context from sentences in other topically related documents. We propose a model that learns to incorporate document-level and corpus-level contextual information alongside local contextual information via global attentions, which dynamically weight their respective contextual information, and gating mechanisms, which determine the influence of this information. Extensive experiments on benchmark datasets show the effectiveness of our approach, which achieves state-of-the-art results for Dutch, German, and Spanish on the CoNLL-2002 and CoNLL-2003 datasets.


MEEP: An Open-Source Platform for Human-Human Dialog Collection and End-to-End Agent Training

October 2020

·

62 Reads

We create a new task-oriented dialog platform (MEEP) where agents are given considerable freedom in terms of utterances and API calls, but are constrained to work within a push-button environment. We include facilities for collecting human-human dialog corpora, and for training automatic agents in an end-to-end fashion. We demonstrate MEEP with a dialog assistant that lets users specify trip destinations.


Parallel Corpus Filtering via Pre-trained Language Models

May 2020

·

31 Reads

Web-crawled data provides a good source of parallel corpora for training machine translation models. It is automatically obtained, but extremely noisy, and recent work shows that neural machine translation systems are more sensitive to noise than traditional statistical machine translation methods. In this paper, we propose a novel approach to filter out noisy sentence pairs from web-crawled corpora via pre-trained language models. We measure sentence parallelism by leveraging the multilingual capability of BERT and use the Generative Pre-training (GPT) language model as a domain filter to balance data domains. We evaluate the proposed method on the WMT 2018 Parallel Corpus Filtering shared task, and on our own web-crawled Japanese-Chinese parallel corpus. Our method significantly outperforms baselines and achieves a new state-of-the-art. In an unsupervised setting, our method achieves comparable performance to the top-1 supervised method. We also evaluate on a web-crawled Japanese-Chinese parallel corpus that we make publicly available.



Describing a Knowledge Base

September 2018

·

295 Reads

We aim to automatically generate natural language narratives about an input structured knowledge base (KB). We build our generation framework based on a pointer network which can copy facts from the input KB, and add two attention mechanisms: (i) slot-aware attention to capture the association between a slot type and its corresponding slot value; and (ii) a new table position self-attention to capture the inter-dependencies among related slots. For evaluation, besides standard metrics including BLEU, METEOR, and ROUGE, we also propose a \textit{KB reconstruction} based metric by extracting a KB from the generation output and comparing it with the input KB. We also create a new data set which includes 106,216 pairs of structured KBs and their corresponding natural language descriptions for two distinct entity types. Experiments show that our approach significantly outperforms state-of-the-art methods. The reconstructed KB achieves 68.8% - 72.6% F-score.


Incident-Driven Machine Translation and Name Tagging for Low-resource Languages

June 2018

·

117 Reads

·

10 Citations

Machine Translation

We describe novel approaches to tackling the problem of natural language processing for low-resource languages. The approaches are embodied in systems for name tagging and machine translation (MT) that we constructed to participate in the NIST LoReHLT evaluation in 2016. Our methods include universal tools, rapid resource and knowledge acquisition, rapid language projection, and joint methods for MT and name tagging.


Figure 1: Writing-editing Network architecture overview. Title An effective method of using Web based information for Relation Extraction (Keong and Su, 2008) Human written abstract 
Paper Abstract Writing through Editing Mechanism

May 2018

·

256 Reads

We present a paper abstract writing system based on an attentive neural sequence-to-sequence model that can take a title as input and automatically generate an abstract. We design a novel Writing-editing Network that can attend to both the title and the previously generated abstract drafts and then iteratively revise and polish the abstract. With two series of Turing tests, where the human judges are asked to distinguish the system-generated abstracts from human-written ones, our system passes Turing tests by junior domain experts at a rate up to 30% and by non-expert at a rate up to 80%.


Citations (17)


... Automatic sentence alignment can extract parallel sentences that are orders of magnitude larger than those obtained by manual translation. If the crawled data in the source and target languages is sentence-level, then the neural MT based method [7] can achieve good performance, pretrained model [46] can filter noisy data, and the LASER tool for bitext mining [6] with greedy algorithm also works well. If the crawled data is document-level, MTbased [21], [32], [33] or similarity-based methods [4], [44], [47] can give more accurate sentence alignments. ...

Reference:

Bilingual Corpus Mining and Multistage Fine-tuning for Improving Machine Translation of Lecture Transcripts
Parallel Corpus Filtering via Pre-trained Language Models
  • Citing Conference Paper
  • January 2020

... Transliteration refers to the process of converting language represented in one writing system to another (Wellisch et al., 1978). Latin scriptcentered transliteration or romanization is the most common form of transliteration (Lin et al., 2018;Amrhein and Sennrich, 2020;Demirsahin et al., 2022) as the Latin/Roman script is by far the most widely adopted writing script in the world (Daniels and Bright, 1996;van Esch et al., 2022). 1 Adapting mPLMs via transliteration can address the two aforementioned critical issues. 1) Since the Latin script covers a dominant portion of the mPLM's vocabulary (e.g., 77% in case of mBERT, see Ács), 'romanizing' the remaining part of the vocabulary might mitigate the vocabulary size issue and boost vocabulary sharing. ...

Platforms for Non-speakers Annotating Names in Any Language
  • Citing Conference Paper
  • January 2018

... Huang et al. [35] went beyond word alignment and proposed alignment on a cluster level so that clusters of words have similar distribution across multiple languages. First, they augmented the monolingual word embeddings with their respective cluster of neighborhood words using an extension of the correlational neural network, CorrNet [36], which were then aligned to form the common semantic space. ...

Multi-lingual Common Semantic Space Construction via Cluster-consistent Word Embedding
  • Citing Conference Paper
  • January 2018

... Automated text generation from structured data has attracted considerable attention in recent years (Lin et al., 2023), and a multitude of data-to-text generation techniques have been utilized in a wide range of applications (Axelsson & Skantze, 2023;Harkous et al., 2020;Kasner et al., 2023;Wang et al., 2018). The biggest advances in this area come from the recent development of large language models (LLMs) such as GPT-3 (Brown et al., 2020) or Bloom (Scao et al., 2022, preprint), which have caused a major paradigm shift. ...

Describing a Knowledge Base

... The above methods generate poems without any polishing or deliberating mechanism. In fact, several methods utilize the polishing procedure based methods to improve the quality of image generation [42] and text generation [43][44][45][46][47][48] . For example, Xia et al. [47] designed a deliberation method to generate the final sequence by the second decoder with an additional input of the generated sequence by the first decoder. ...

Paper Abstract Writing through Editing Mechanism

... Ref. [17] assumed that each bag contained at least one sentence that revealed the true relation, and they ignored other noisy samples when finding the true instance. Ref. [9,18] proposed an adversarial-training-based method to enable the model to recognize noisy data and reduce the introduction of noisy annotations. Ref. [10,12,19] proposed reinforced-learning-based methods to train a high-quality sentence selector for noise filtering, and then, modeled the RE on selected clean data. ...

Genre Separation Network with Adversarial Training for Cross-genre Relation Extraction
  • Citing Conference Paper
  • January 2018

... An earlier CRF-based work by Durrett and Klein (2014) shows benefits from joint modeling of coreference resolution across a document, named entity recognition and entity linking, and notes that propagating information between different mentions of an entity in a document can help resolve ambiguous cases of semantic types or entity links. In previous neural models similar ideas of using document-level contextual information in order to improve typing of entities have been considered (Zhang et al., 2020a). The authors of this work apply an attention mechanism in order to aggregate information between different mentions of the same underlying entity. ...

Global Attention for Name Tagging
  • Citing Conference Paper
  • January 2018

... In this paper we explore idiom comprehension (Wray, 2002;Jackendoff and Jackendoff, 2002;Cacciari and Tabossi, 2014;Jiang et al., 2018) in cloze test. Idiom , which is called "成 语" (chengyu) in Chinese, is an interesting linguistic phenomena in Chinese language, and this work * *Corresponding author: Minlie Huang. ...

Chengyu Cloze Test

... Their findings were later verified by Srivastava and Salakhutdinov (2012), who use a Deep Boltzmann Machine (Salakhutdinov and Hinton, 2009) to generate/map data from the image and text modality. Huang et al. (2018) construct a multilingual common semantic space to achieve better machine translation performance by extending correlation networks (Chandar et al., 2016). They use multiple non-linear transformations to repeatedly reconstruct sentences from one language to another and finally build a common semantic space for all the different languages. ...

Multi-lingual Common Semantic Space Construction via Cluster-consistent Word Embedding