Kuzman Ganchev’s research while affiliated with Google Inc. and other places


Ad

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (55)


Dolomites : Domain-Specific Long-Form Methodical Tasks
  • Article

January 2024

·

3 Reads

Transactions of the Association for Computational Linguistics

Chaitanya Malaviya

·

Priyanka Agrawal

·

Kuzman Ganchev

·

[...]

·

Chris Alberti

Experts in various fields routinely perform methodical writing tasks to plan, organize, and report their work. From a clinician writing a differential diagnosis for a patient, to a teacher writing a lesson plan for students, these tasks are pervasive, requiring to methodically generate structured long-form output for a given input. We develop a typology of methodical tasks structured in the form of a task objective, procedure, input, and output, and introduce DoLoMiTes, a novel benchmark with specifications for 519 such tasks elicited from hundreds of experts from across 25 fields. Our benchmark further contains specific instantiations of methodical tasks with concrete input and output examples (1,857 in total) which we obtain by collecting expert revisions of up to 10 model-generated examples of each task. We use these examples to evaluate contemporary language models, highlighting that automating methodical tasks is a challenging long-form generation problem, as it requires performing complex inferences, while drawing upon the given context as well as domain knowledge. Our dataset is available at https://dolomites-benchmark.github.io/.


Figure 1: Synthetic data generation for multilingual question-answering (QA). Left: Examples of the multilingual QA task. Translations are added for readability. Middle: Strategies for localizing QA models to new languages: 1. Using English QA data as a zero-shot approach, 2. with Machine Translation (MT) to approximate training data for supervised learning, and 3. few-shot approaches with a handful of multilingual examples. Right: Model performance on the multilingual QA task. We report average Exact Match (EM) across all languages on the TYDIQA-GOLDP dataset (Clark et al., 2020).
Figure 2: Effect of synthetic data size on downstream QA performance (Average EM on TYDIQA-GOLDP evaluation set); results shown for mT5-XL QA model fine-tuned via Machine Translation (MT), Prompt Engineering (PE), Prompt Tuning (QAMELEON (PT)), and combinations thereof (PE + MT and QAMELEON (PT) + MT).
Figure 3: Distribution of question category for QAMELEON (PT) generated questions (a,b,c) and TYDIQA-GOLDP training questions (d,e,f). Category are obtained by translating the questions to English with Google Translate and grouping by the first two word tokens.
Examples of QA pairs from human-annotated TYDI QA and generated by QAMELEON (PT) on corresponding passages. English translations from Google Translate are added for readability.
QAmeleon : Multilingual QA with Only 5 Examples
  • Article
  • Full-text available

December 2023

·

71 Reads

·

7 Citations

Transactions of the Association for Computational Linguistics

The availability of large, high-quality datasets has been a major driver of recent progress in question answering (QA). Such annotated datasets, however, are difficult and costly to collect, and rarely exist in languages other than English, rendering QA technology inaccessible to underrepresented languages. An alternative to building large monolingual training datasets is to leverage pre-trained language models (PLMs) under a few-shot learning setting. Our approach, QAmeleon, uses a PLM to automatically generate multilingual data upon which QA models are fine-tuned, thus avoiding costly annotation. Prompt tuning the PLM with only five examples per language delivers accuracy superior to translation-based baselines; it bridges nearly 60% of the gap between an English-only baseline and a fully-supervised upper bound fine-tuned on almost 50,000 hand-labeled examples; and consistently leads to improvements compared to directly fine-tuning a QA model on labeled examples in low resource settings. Experiments on the TyDiqa-GoldP and MLQA benchmarks show that few-shot prompt tuning for data synthesis scales across languages and is a viable alternative to large-scale annotation.1

Download

Conditional Generation with a Question-Answering Blueprint

August 2023

·

70 Reads

·

15 Citations

Transactions of the Association for Computational Linguistics

The ability to convey relevant and faithful information is critical for many tasks in conditional generation and yet remains elusive for neural seq-to-seq models whose outputs often reveal hallucinations and fail to correctly cover important details. In this work, we advocate planning as a useful intermediate representation for rendering conditional generation less opaque and more grounded. We propose a new conceptualization of text plans as a sequence of question-answer (QA) pairs and enhance existing datasets (e.g., for summarization) with a QA blueprint operating as a proxy for content selection (i.e., what to say) and planning (i.e., in what order). We obtain blueprints automatically by exploiting state-of-the-art question generation technology and convert input-output pairs into input-blueprint-output tuples. We develop Transformer-based models, each varying in how they incorporate the blueprint in the generated output (e.g., as a global plan or iteratively). Evaluation across metrics and datasets demonstrates that blueprint models are more factual than alternatives which do not resort to planning and allow tighter control of the generation output.


Figure 1: User interface of the web browser-based Text-Blueprint demonstration showcasing the iterative model.
Figure 3: Schematic representation of the different components of the web browser-based demonstration.
Figure 4: Example snapshot of the results obtained with the end-to-end Blueprint model for the user query "Why is the sky blue?". Depending on which question-answer pairs the user selects, different summaries can be generated.
Figure 5: Snapshot of the results obtained with the interactive Blueprint model for the query "What is the Titanic known for?". Questions highlighted in red were manually added by the user, leading to a different output.
Text-Blueprint: An Interactive Platform for Plan-based Conditional Generation

April 2023

·

48 Reads

While conditional generation models can now generate natural language well enough to create fluent text, it is still difficult to control the generation process, leading to irrelevant, repetitive, and hallucinated content. Recent work shows that planning can be a useful intermediate step to render conditional generation less opaque and more grounded. We present a web browser-based demonstration for query-focused summarization that uses a sequence of question-answer pairs, as a blueprint plan for guiding text generation (i.e., what to say and in what order). We illustrate how users may interact with the generated text and associated plan visualizations, e.g., by editing and modifying the blueprint in order to improve or control the generated output. A short video demonstrating our system is available at https://goo.gle/text-blueprint-demo.



Towards Computationally Verifiable Semantic Grounding for Language Models

November 2022

·

40 Reads

The paper presents an approach to semantic grounding of language models (LMs) that conceptualizes the LM as a conditional model generating text given a desired semantic message formalized as a set of entity-relationship triples. It embeds the LM in an auto-encoder by feeding its output to a semantic parser whose output is in the same representation domain as the input message. Compared to a baseline that generates text using greedy search, we demonstrate two techniques that improve the fluency and semantic accuracy of the generated text: The first technique samples multiple candidate text sequences from which the semantic parser chooses. The second trains the language model while keeping the semantic parser frozen to improve the semantic accuracy of the auto-encoder. We carry out experiments on the English WebNLG 3.0 data set, using BLEU to measure the fluency of generated text and standard parsing metrics to measure semantic accuracy. We show that our proposed approaches significantly improve on the greedy search baseline. Human evaluation corroborates the results of the automatic evaluation experiments.


Figure 2: Effect of the size of synthetic data used for mT5-XL QA model fine-tuned via Machine Translation (MT), Prompt Engineering (PE), Prompt Tuning, i.e., QAMELEON (PT) and combinations of PE + MT and QAMELEON (PT) + MT. Average EM is reported for the TYDIQA-GOLDP eval set.
QA performance (Average EM) for individual languages on the TYDIQA-GOLDP evaluation set; the backbone of the QA model is an mT5-XL model fine-tuned on gold (Supervised) or synthetically generated data. The final row displays the percent of tokens for each language in the PaLM training data.
QAmeleon: Multilingual QA with Only 5 Examples

November 2022

·

88 Reads

·

2 Citations

The availability of large, high-quality datasets has been one of the main drivers of recent progress in question answering (QA). Such annotated datasets however are difficult and costly to collect, and rarely exist in languages other than English, rendering QA technology inaccessible to underrepresented languages. An alternative to building large monolingual training datasets is to leverage pre-trained language models (PLMs) under a few-shot learning setting. Our approach, QAmeleon, uses a PLM to automatically generate multilingual data upon which QA models are trained, thus avoiding costly annotation. Prompt tuning the PLM for data synthesis with only five examples per language delivers accuracy superior to translation-based baselines, bridges nearly 60% of the gap between an English-only baseline and a fully supervised upper bound trained on almost 50,000 hand labeled examples, and always leads to substantial improvements compared to fine-tuning a QA model directly on labeled examples in low resource settings. Experiments on the TyDiQA-GoldP and MLQA benchmarks show that few-shot prompt tuning for data synthesis scales across languages and is a viable alternative to large-scale annotation.


Figure 1: E2E, MULTITASK and ITERATIVE Blueprint variants. rather than entity chains.
E2Emodel trained on AQuaMuSe with QA or AQ strategies. Performance on the valida- tion set.
Conditional Generation with a Question-Answering Blueprint

July 2022

·

133 Reads

·

1 Citation

The ability to convey relevant and faithful information is critical for many tasks in conditional generation and yet remains elusive for neural seq-to-seq models whose outputs often reveal hallucinations and fail to correctly cover important details. In this work, we advocate planning as a useful intermediate representation for rendering conditional generation less opaque and more grounded. Our work proposes a new conceptualization of text plans as a sequence of question-answer (QA) pairs. We enhance existing datasets (e.g., for summarization) with a QA blueprint operating as a proxy for both content selection (i.e.,~what to say) and planning (i.e.,~in what order). We obtain blueprints automatically by exploiting state-of-the-art question generation technology and convert input-output pairs into input-blueprint-output tuples. We develop Transformer-based models, each varying in how they incorporate the blueprint in the generated output (e.g., as a global plan or iteratively). Evaluation across metrics and datasets demonstrates that blueprint models are more factual than alternatives which do not resort to planning and allow tighter control of the generation output.


Precision, recall and F 1 -measure (in %s) for different feature sets on the test dataset.
Feature-Rich Named Entity Recognition for Bulgarian Using Conditional Random Fields

September 2021

·

14 Reads

The paper presents a feature-rich approach to the automatic recognition and categorization of named entities (persons, organizations, locations, and miscellaneous) in news text for Bulgarian. We combine well-established features used for other languages with language-specific lexical, syntactic and morphological information. In particular, we make use of the rich tagset annotation of the BulTreeBank (680 morpho-syntactic tags), from which we derive suitable task-specific tagsets (local and nonlocal). We further add domain-specific gazetteers and additional unlabeled data, achieving F1=89.4%, which is comparable to the state-of-the-art results for English.


Performance of recent neural network based models without using pretrained embeddings. Our model's wins are statsitically significantly better than prior work (p < 0.05 bootstrap resampling), except on PKU.
Hyperparameter settings.
State-of-the-art Chinese Word Segmentation with Bi-LSTMs

August 2018

·

164 Reads

·

1 Citation

A wide variety of neural-network architectures have been proposed for the task of Chinese word segmentation. Surprisingly, we find that a bidirectional LSTM model, when combined with standard deep learning techniques and best practices, can achieve better accuracy on many of the popular datasets as compared to models based on more complex neural-network architectures. Furthermore, our error analysis shows that out-of-vocabulary words remain challenging for neural-network models, and many of the remaining errors are unlikely to be fixed through architecture changes. Instead, more effort should be made on exploring resources for further improvement.


Ad

Citations (43)


... There are a two more recent works that touch on these ideas, but both have significant downsides compared to our approach. Meng et al. (2019) use a PR approach inspired by Ganchev and Das (2013) for cross-lingual parsing, but must use very simple constraints and require a slow inference procedure that can only be used at test time. Noach and Goldberg (2019) utilize GEC with minibatch training, but focus on using related tasks for computing simpler constraints and do not adapt their targets to small batch sizes. ...

Reference:

Improving Low-Resource Cross-lingual Parsing with Expected Statistic Regularization
Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
  • Citing Conference Paper
  • January 2013

... Many studies have utilized LLMs for data generation, with a predominant focus on training small models for various NLP downstream tasks [37,38,16,39,40,17,18]. Despite their effectiveness, these approaches encounter difficulties when directly applied to tool learning. ...

QAmeleon : Multilingual QA with Only 5 Examples

Transactions of the Association for Computational Linguistics

... Cross-lingual summarization faces the compounded challenge of having to tackle difficulties relating to both monolingual summarization (e.g., long inputs and outputs, faithfulness to the input documents (Maynez et al., 2020)) and challenges seen in machine translation (e.g., data imbalance, alignment across languages (Koehn and Knowles, 2017)). Recent work has shown that introducing an intermediate content planning step is helpful for summarization in English, resulting in higher quality summaries, especially in terms of faithfulness (Puduppully et al., 2019a;Moryossef et al., 2019;Narayan et al., 2021Narayan et al., , 2022Huot et al., 2023). In this work, we present µPLAN, a crosslingual summarization method that uses such a content plan (i.e., what to say and in what order) as a cross-lingual bridge (Figure 1). ...

Text-Blueprint: An Interactive Platform for Plan-based Conditional Generation
  • Citing Conference Paper
  • January 2023

... The fine-tuned approach requires agent outputs, which are not readily available; planning agent outputs such as plot and setting are usually not provided in datasets, while writing agent outputs require the stories to be split into its constituent parts. Similar to previous work (Schick et al., 2022;Narayan et al., 2023;Josifoski et al., 2023), we propose to generate synthetic outputs for these agents through distilled backtranslation. ...

Conditional Generation with a Question-Answering Blueprint

Transactions of the Association for Computational Linguistics

... For the geoscience domain, CWS functions as a crucial step for NLP of geological texts, using NLP technology to discover new knowledge and advance geological work. However, domains such as general domain [11,12], electronic medical records [13,14], and Chinese novels [15,16] are the focus of the most current research. ...

State-of-the-art Chinese Word Segmentation with Bi-LSTMs
  • Citing Conference Paper
  • January 2018

... It encompasses three subtasks: TI, FC and FSRL. Most of the previous studies [10,11,12,13,14,15] focus on FI or FSRL subtask individually, whereas others [16,17,18,19] simultaneously handle FC and FSRL subtasks, assuming the availability of all targets in advance. Limited work focus on TI and end-to-end FSP. ...

Efficient Inference and Structured Learning for Semantic Role Labeling
  • Citing Article
  • December 2015

Transactions of the Association for Computational Linguistics

... In Chinese, however, sentences are represented as strings of Chinese characters or hanzi without similar natural delimiters, as opposed to English where sentences are sequences of words separated by white spaces. Consequently, the first step in a Chinese language processing task is to identify the word order in a sentence and mark appropriate boundary locations [46,47]. ...

State-of-the-art Chinese Word Segmentation with Bi-LSTMs

... Early applications for annotation-projection include: dependency parsing [19]; part-of-speech taggers [38]; machine translation [39,33]; divergence-inspired alignment [10]; and creation of syntactic-dependency datasets for multiple languages [27]. We borrow the notion of annotation projection to produce explainable, cross-language SRL that advances the state of the art. ...

Universal dependency annotation for multilingual parsing
  • Citing Article
  • January 2013

... With the rapid advancement of Artificial Intelligence (AI) and computing technologies, deep learning (DL) has demonstrated exceptional performance in various fields, including image classification [1], speech recognition [2], natural language processing [3], and fault detection [4]. However, many studies have shown that DL models are vulnerable to adversarial attacks that can severely compromise their reliability and performance [5][6][7][8][9]. ...

Globally Normalized Transition-Based Neural Networks