Illia Polosukhin’s research while affiliated with Google Inc. and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (8)


Natural Questions: A Benchmark for Question Answering Research
  • Article
  • Full-text available

March 2019

·

1,748 Reads

·

2,252 Citations

Transactions of the Association for Computational Linguistics

Tom Kwiatkowski

·

Jennimaria Palomaki

·

Olivia Redfield

·

[...]

·

Slav Petrov

We present the Natural Questions corpus, a question answering data set. Questions consist of real anonymized, aggregated queries issued to the Google search engine. An annotator is presented with a question along with a Wikipedia page from the top 5 search results, and annotates a long answer (typically a paragraph) and a short answer (one or more entities) if present on the page, or marks null if no long/short answer is present. The public release consists of 307,373 training examples with single annotations; 7,830 examples with 5-way annotations for development data; and a further 7,842 examples with 5-way annotated sequestered as test data. We present experiments validating quality of the data. We also describe analysis of 25-way annotations on 302 examples, giving insights into human variability on the annotation task. We introduce robust metrics for the purposes of evaluating question answering systems; demonstrate high human upper bounds on these metrics; and establish baseline results using competitive methods drawn from related literature.

Download

NAPS: Natural Program Synthesis Dataset

July 2018

·

70 Reads

·

1 Citation

We present a program synthesis-oriented dataset consisting of human written problem statements and solutions for these problems. The problem statements were collected via crowdsourcing and the program solutions were extracted from human-written solutions in programming competitions, accompanied by input/output examples. We propose using this dataset for the program synthesis tasks aimed for working with real user-generated data. As a baseline we present few models, with the best model achieving 8.8% accuracy, showcasing both the complexity of the dataset and large room for future research.


Neural Program Search: Solving Programming Tasks from Description and Examples

February 2018

·

100 Reads

·

40 Citations

We present a Neural Program Search, an algorithm to generate programs from natural language description and a small number of input/output examples. The algorithm combines methods from Deep Learning and Program Synthesis fields by designing rich domain-specific language (DSL) and defining efficient search algorithm guided by a Seq2Tree model on it. To evaluate the quality of the approach we also present a semi-synthetic dataset of descriptions with test examples and corresponding programs. We show that our algorithm significantly outperforms a sequence-to-sequence model with attention baseline.


Attention Is All You Need

June 2017

·

29,854 Reads

·

123,917 Citations

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.


Figure 1: Hierarchical question answering: the model first selects relevant sentences that produce a document summary ( ˆ d) for the given query (x), and then generates an answer (y) based on the summary ( ˆ d) and the query x. 
Figure 2: A training example containing a document d, a question x and an answer y in the WIKISUGGEST dataset. In this example, the sentence s5 is necessary to answer the question. 
Table 2 : Data statistics. 
Figure 5: For a random subset of documents in the development set, we visualized the learned attention over the sentences (p(s l |d, x)). 
Coarse-to-Fine Question Answering for Long Documents

January 2017

·

391 Reads

·

179 Citations


Figure 1: Hierarchical question answering: the model first selects relevant sentences that produce a document summary ( ˆ d) for the given query (x), and then generates an answer (y) based on the summary ( ˆ d).
Figure 4: For a random subset of documents in the development set, we visualized the learned attention distribution over the sentences (p(s l |d, x)).
Hierarchical Question Answering for Long Documents

November 2016

·

458 Reads

·

17 Citations

Reading an article and answering questions about its content is a fundamental task for natural language understanding. While most successful neural approaches to this problem rely on recurrent neural networks (RNNs), training RNNs over long documents can be prohibitively slow. We present a novel framework for question answering that can efficiently scale to longer documents while maintaining or even improving performance. Our approach combines a coarse, inexpensive model for selecting one or more relevant sentences and a more expensive RNN that produces the answer from those sentences. A central challenge is the lack of intermediate supervision for the coarse model, which we address using reinforcement learning. Experiments demonstrate state-of-the-art performance on a challenging subset of the WIKIREADING dataset(Hewlett et al., 2016) and on a newly-gathered dataset, while reducing the number of sequential RNN steps by 88% against a standard sequence to sequence model.


WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia

August 2016

We present WikiReading, a large-scale natural language understanding task and publicly-available dataset with 18 million instances. The task is to predict textual values from the structured knowledge base Wikidata by reading the text of the corresponding Wikipedia articles. The task contains a rich variety of challenging classification and extraction sub-tasks, making it well-suited for end-to-end models such as deep neural networks (DNNs). We compare various state-of-the-art DNN-based architectures for document classification, information extraction, and question answering. We find that models supporting a rich answer space, such as word or character sequences, perform best. Our best-performing model, a word-level sequence to sequence model with a mechanism to copy out-of-vocabulary words, obtains an accuracy of 71.8%.


Figure 1: Illustration of RNN models. Blocks with same color share parameters. Red words are out of vocabulary and all share a common embedding.  
Figure 2: Character seq2seq model. Blocks with the same color share parameters. The same example as in Figure 1 is fed character by character.
WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia

August 2016

·

259 Reads

·

143 Citations

We present WikiReading, a large-scale natural language understanding task and publicly-available dataset with 18 million instances. The task is to predict textual values from the structured knowledge base Wikidata by reading the text of the corresponding Wikipedia articles. The task contains a rich variety of challenging classification and extraction sub-tasks, making it well-suited for end-to-end models such as deep neural networks (DNNs). We compare various state-of-the-art DNN-based architectures for document classification, information extraction, and question answering. We find that models supporting a rich answer space, such as word or character sequences, perform best. Our best-performing model, a word-level sequence to sequence model with a mechanism to copy out-of-vocabulary words, obtains an accuracy of 71.8%.

Citations (6)


... We comprehensively evaluate DynamicRAG on knowledge-intensive tasks using seven datasets: NQ Kwiatkowski et al. [2019], TriviaQA Joshi et al. [2017], HotpotQA Yang et al. [2018], 2WikimQA Ho et al. [2020], ASQA Stelmakh et al. [2022], FEVER Thorne et al. [2018], and ELI5 Fan et al. [2019]. Additionally, we assess the recall results of DynamicRAG's reranker on NQ and HotpotQA. ...

Reference:

DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation
Natural Questions: A Benchmark for Question Answering Research

Transactions of the Association for Computational Linguistics

... One very prominent technique of learning the heuristic is deep learning. [53] used a Convolutional Neural Network (CNN) trained with supervised learning to predict programs from hand drawn images and in [54] a RNN was used to encode the tree structure of the current program. RL was used to learn a neural network model, to guide the program search, without the need for ground truth data in [55]. ...

Neural Program Search: Solving Programming Tasks from Description and Examples
  • Citing Article
  • February 2018

... Our literature review is summarized in Table 1. While many existing works Choi et al. 2017) focus on targeted answers or long retrieved documents with short, homogenous queries, our framework aims to provide an approach that allows for incorporating human domain experts in a sparse manner for generating high-quality datasets for QBD. Such datasets in turn can facilitate the fine-tuning of retrieval models and improving the re-ranking performance of the candidates returned. ...

Coarse-to-Fine Question Answering for Long Documents

... • Tokenization and part-of-speech tagging • Named Entity Recognition (NER) for identifying companies, people, and locations • Dependency parsing to understand relationships between words • Text summarization to extract key information from lengthy articles Advanced algorithms often utilize deep learning models like BERT (Bidirectional Encoder Representations from Transformers) or GPT (Generative Pre-trained Transformer) for more nuanced text understanding [5]. ...

Attention Is All You Need
  • Citing Article
  • June 2017

... Because simple splicing of evidence and inputting them into the model would cause negative impacts on obtaining answers, such as noise, etc., and the auxiliary role of candidate answers is easily ignored in the independent extraction of answers from multiple evidence passages. Trischler et al. [31] proposed a two-step extraction model, which firstly extracted several candidates and then compared them according to the passages. However, in their work, each candidate was considered in isolation. ...

Hierarchical Question Answering for Long Documents

... Hewlett et al., in 2016, introduced the WikiReading dataset [44], which offers 884 domains for large-scale language understanding tasks across Wikipedia articles. It serves as a comprehensive resource for training models to comprehend and extract information from encyclopedic text. ...

WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia