Kevin Knight’s research while affiliated with University of Southern California and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (232)


Figure 2: MeetDot room creation. Landing page (top, left panel) Any user can set up a MeetDot room and share its URL with potential participants (bottom, left panel). Admin users can select parameters that control captioning, speech recognition, and translation (right panel, §3).
Figure 3: MeetDot videoconference interface. Translated captions are incrementally updated (word-by-word, phrase-by-phrase) on top of participant videos. Translations also appear in the transcript panel (on right, not shown), updated utterance-by-utterance. Choosing a caption language (4th button from left at the bottom, in green) displays all captions in that particular language. This depicts the view of the English caption user.
MeetDot: Videoconferencing with Live Translation Captions
  • Preprint
  • File available

September 2021

·

415 Reads

Arkady Arkhangorodsky

·

Christopher Chu

·

Scot Fang

·

[...]

·

Kevin Knight

We present MeetDot, a videoconferencing system with live translation captions overlaid on screen. The system aims to facilitate conversation between people who speak different languages, thereby reducing communication barriers between multilingual participants. Currently, our system supports speech and captions in 4 languages and combines automatic speech recognition (ASR) and machine translation (MT) in a cascade. We use the re-translation strategy to translate the streamed speech, resulting in caption flicker. Additionally, our system has very strict latency requirements to have acceptable call quality. We implement several features to enhance user experience and reduce their cognitive load, such as smooth scrolling captions and reducing caption flicker. The modular architecture allows us to integrate different ASR and MT services in our backend. Our system provides an integrated evaluation suite to optimize key intrinsic evaluation metrics such as accuracy, latency and erasure. Finally, we present an innovative cross-lingual word-guessing game as an extrinsic evaluation metric to measure end-to-end system performance. We plan to make our system open-source for research purposes.

Download

Figure 3: Extensive game tree. There is one chance node (black), 2 user decision nodes (red), and 4 agent decision nodes grouped into 2 information sets (blue). The tree has 8 leaf nodes which store rewards.
Figure 5: Three visualized equilibria. The x-axis gives mixed strategies for the user (1.0 = always faithfully relate desired destination, 0.0 always lie about destination). The y-axis gives mixed strategies for the agent (1.0 = always obey user, 0.0 always disobey user).
Two Approaches to Building Collaborative, Task-Oriented Dialog Agents through Self-Play

September 2021

·

33 Reads

Task-oriented dialog systems are often trained on human/human dialogs, such as collected from Wizard-of-Oz interfaces. However, human/human corpora are frequently too small for supervised training to be effective. This paper investigates two approaches to training agent-bots and user-bots through self-play, in which they autonomously explore an API environment, discovering communication strategies that enable them to solve the task. We give empirical results for both reinforcement learning and game-theoretic equilibrium finding.


Learning Mathematical Properties of Integers

September 2021

·

15 Reads

Embedding words in high-dimensional vector spaces has proven valuable in many natural language applications. In this work, we investigate whether similarly-trained embeddings of integers can capture concepts that are useful for mathematical applications. We probe the integer embeddings for mathematical knowledge, apply them to a set of numerical reasoning tasks, and show that by learning the representations from mathematical sequence data, we can substantially improve over number embeddings learned from English text corpora.



Hyper-parameters
A Hybrid Task-Oriented Dialog System with Domain and Task Adaptive Pretraining

February 2021

·

121 Reads

This paper describes our submission for the End-to-end Multi-domain Task Completion Dialog shared task at the 9th Dialog System Technology Challenge (DSTC-9). Participants in the shared task build an end-to-end task completion dialog system which is evaluated by human evaluation and a user simulator based automatic evaluation. Different from traditional pipelined approaches where modules are optimized individually and suffer from cascading failure, we propose an end-to-end dialog system that 1) uses Generative Pretraining 2 (GPT-2) as the backbone to jointly solve Natural Language Understanding, Dialog State Tracking, and Natural Language Generation tasks, 2) adopts Domain and Task Adaptive Pretraining to tailor GPT-2 to the dialog domain before finetuning, 3) utilizes heuristic pre/post-processing rules that greatly simplify the prediction tasks and improve generalizability, and 4) equips a fault tolerance module to correct errors and inappropriate responses. Our proposed method significantly outperforms baselines and ties for first place in the official evaluation. We make our source code publicly available.




Why Neural Machine Translation Prefers Empty Outputs

December 2020

·

87 Reads

We investigate why neural machine translation (NMT) systems assign high probability to empty translations. We find two explanations. First, label smoothing makes correct-length translations less confident, making it easier for the empty translation to finally outscore them. Second, NMT systems use the same, high-frequency EoS word to end all target sentences, regardless of length. This creates an implicit smoothing that increases zero-length translations. Using different EoS types in target sentences of different lengths exposes and eliminates this implicit smoothing.


MUSE: Illustrating Textual Attributes by Portrait Generation

November 2020

·

18 Reads

We propose a novel approach, MUSE, to illustrate textual attributes visually via portrait generation. MUSE takes a set of attributes written in text, in addition to facial features extracted from a photo of the subject as input. We propose 11 attribute types to represent inspirations from a subject's profile, emotion, story, and environment. We propose a novel stacked neural network architecture by extending an image-to-image generative model to accept textual attributes. Experiments show that our approach significantly outperforms several state-of-the-art methods without using textual attributes, with Inception Score score increased by 6% and Fr\'echet Inception Distance (FID) score decreased by 11%, respectively. We also propose a new attribute reconstruction metric to evaluate whether the generated portraits preserve the subject's attributes. Experiments show that our approach can accurately illustrate 78% textual attributes, which also help MUSE capture the subject in a more creative and expressive way.


DiDi's Machine Translation System for WMT2020

October 2020

·

35 Reads

This paper describes DiDi AI Labs' submission to the WMT2020 news translation shared task. We participate in the translation direction of Chinese->English. In this direction, we use the Transformer as our baseline model, and integrate several techniques for model enhancement, including data filtering, data selection, back-translation, fine-tuning, model ensembling, and re-ranking. As a result, our submission achieves a BLEU score of 36.6 in Chinese->English.


Citations (67)


... The authors used the two languages' alphabets to extend the word embedding and modifying the similarity score functions of previous word-embedding methods to include the orthographic similarity measure. Bilingual lexicons are shown to improve machine translation in both RBMT (Turcato, 1998) and CBMT (Chu et al., 2014;Dou and Knight, 2013;Dou et al., 2014). ...

Reference:

A Survey of Orthographic Information in Machine Translation
Dependency-Based Decipherment for Resource-Limited Machine Translation
  • Citing Conference Paper
  • January 2013

... However, Liang et al. [20] suggest that LLM-generated and human feedback can complement each other by using ChatGPT-4 to generate comments on full PDFs of scientific papers. Other research has also been done on evaluating the potential for AI to assist in automatically generating acceptance or rejection scores for submitted papers [18] [38] [35], evaluating the quality of human reviews [17], and evaluating and generating meta-reviews [19]. ...

ReviewRobot: Explainable Paper Review Generation based on Knowledge Synthesis
  • Citing Conference Paper
  • January 2020

... Another method forced the cosine similarity between word embeddings of two numbers to reflect their actual distance on the number line [94]. Further refinements combine the prototype-based approach with the scientific notation encoding [95]. Another interesting technique exploits the Online Encyclopedia of Integer Sequences, which includes notable series of numbers that are of particular mathematical interest, to train more meaningful number embeddings [96]. ...

Learning Mathematical Properties of Integers
  • Citing Conference Paper
  • January 2021

... Closed domain work such as Dong et al. [9], Nam, Kim, and Kim [32], Li et al. [23], and Patashnik et al. [36] achieves much higher quality images, but comes at the cost of being afunctional outside the domain the model was trained on. Other authors simplify the problem by using non-textual auxiliary inputs [43,34] or filtering the inputs to a small, pre-defined vocabulary [17]. ...

MUSE: Textual Attributes Guided Portrait Painting Generation
  • Citing Conference Paper
  • September 2021

... Automatic sentence alignment can extract parallel sentences that are orders of magnitude larger than those obtained by manual translation. If the crawled data in the source and target languages is sentence-level, then the neural MT based method [7] can achieve good performance, pretrained model [46] can filter noisy data, and the LASER tool for bitext mining [6] with greedy algorithm also works well. If the crawled data is document-level, MTbased [21], [32], [33] or similarity-based methods [4], [44], [47] can give more accurate sentence alignments. ...

Parallel Corpus Filtering via Pre-trained Language Models
  • Citing Conference Paper
  • January 2020

... This paper compiles some data resources that can be used for low-resource language translation. International Workshop on Spoken Language Translation(IWSLT) [19] dataset is one of the standard datasets used to evaluate and research machine translation systems, particularly spoken language translation systems. IWSLT organizes translation competitions annually and provides multilingual datasets for training, validation, and testing. ...

FINDINGS OF THE IWSLT 2020 EVALUATION CAMPAIGN
  • Citing Conference Paper
  • January 2020

... Due to the high societal value of these tasks and the scarcity of expert knowledge and time, machine learning promises to provide an invaluable aid for understanding the ancient world. In the context of a number of works on ML applied to diverse ancient inscriptions (Hassner et al., 2013a;Assael et al., 2019;Yin et al., 2019;Huang et al., 2019;Luo et al., 2021;Hayon et al., 2024), the cuneiform script poses particular challenges. These include the nature of the physical writing media (indentations in textured and often damaged clay under various lighting conditions), and the diverse nature of cuneiform signs which changed over thousands of years of use in vast geographical regions. ...

Decipherment of Historical Manuscript Images
  • Citing Conference Paper
  • September 2019

... One of the most important things in deciphering unknown ancient languages is to try to determine their closest relative(s), as this could have a positive impact on the resolution of challenges outlined in Section 4. The importance of using related languages to enhance statistical language models is not new, and it was already emphasized in previous work (Currey and Karakanta 2016;Pourdamghani and Knight 2017;Karakanta, Dehdari, and van Genabith 2018;Pourdamghani and Knight 2019;Mavridaki, Galiotou, and Papakitsos 2021b). Tóth (2007) also states that the distribution of many typological features of languages is not random but restricted to a relative geographical area. ...

Neighbors helping the poor: improving low-resource machine translation using related languages

Machine Translation

... Prior work has developed automatic classifiers of translationese, detecting whether the text exhibits translationese (is translated) or not (Rabinovich and Wintner, 2015;Rabinovich et al., 2017;Pylypenko et al., 2021). Related work has also considered the impact and causes of translationese via investigating the algorithmic biases which lead to translationese in MT (Vanmassenhove et al., 2021), avoiding the influence of translationese in training and testing by means of translationese classifiers and zero-shot multilingual MT (Riley et al., 2020), and developing a MT system which first generated rough glosses of the original text and then translated the resulting gloss into a fluent native-like translation (Pourdamghani et al., 2019). ...

Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation
  • Citing Conference Paper
  • January 2019