Nando de Freitas’s research while affiliated with Google Inc. and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (202)


Bar chart depicting the number of articles published on the topic of machine learning for ancient languages per year. As can be seen, the last 5 years have seen a substantial increase in the number of publications.
Proposed taxonomy to study machine learning for ancient languages, inspired by the different steps involved in the study of ancient documents.
Distribution of machine learning model architectures (with ≥ 2 articles per architecture). Under Engineered Features we’ve grouped works using PCA, HOG, HOOSC, word frequencies, and other similar methods for further analysis. Under Software, we include all methods using third party or standalone software.
Distribution of existing machine learning models utilized (with ≥ 2 articles per model).
Distribution of publications per language (with ≥ 2 articles per language). Under “cuneiform” script, we include the Akkadian, Sumerian, and Babylonian languages.
Machine Learning for Ancient Languages: A Survey
  • Article
  • Full-text available

September 2023

·

375 Reads

·

37 Citations

Computational Linguistics

·

·

·

[...]

·

Nando de Freitas

Ancient languages preserve the cultures and histories of the past. However, their study is fraught with difficulties, and experts must tackle a range of challenging text-based tasks, from deciphering lost languages to restoring damaged inscriptions, to determining the authorship of works of literature. Technological aids have long supported the study of ancient texts, but in recent years advances in artificial intelligence and machine learning have enabled analyses on a scale and in a detail that are reshaping the field of humanities, similarly to how microscopes and telescopes have contributed to the realm of science. This article aims to provide a comprehensive survey of published research using machine learning for the study of ancient texts written in any language, script, and medium, spanning over three and a half millennia of civilizations around the ancient world. To analyze the relevant literature, we introduce a taxonomy of tasks inspired by the steps involved in the study of ancient documents: digitization, restoration, attribution, linguistic analysis, textual criticism, translation, and decipherment. This work offers three major contributions: first, mapping the interdisciplinary field carved out by the synergy between the humanities and machine learning; second, highlighting how active collaboration between specialists from both fields is key to producing impactful and compelling scholarship; third, highlighting promising directions for future work in this field. Thus, this work promotes and supports the continued collaborative impetus between the humanities and machine learning.

Download

Reinforced Self-Training (ReST) for Language Modeling

August 2023

·

711 Reads

·

2 Citations

Reinforcement learning from human feedback (RLHF) can improve the quality of large language model's (LLM) outputs by aligning them with human preferences. We propose a simple algorithm for aligning LLMs with human preferences inspired by growing batch reinforcement learning (RL), which we call Reinforced Self-Training (ReST). Given an initial LLM policy, ReST produces a dataset by generating samples from the policy, which are then used to improve the LLM policy using offline RL algorithms. ReST is more efficient than typical online RLHF methods because the training dataset is produced offline, which allows data reuse. While ReST is a general approach applicable to all generative learning settings, we focus on its application to machine translation. Our results show that ReST can substantially improve translation quality, as measured by automated metrics and human evaluation on machine translation benchmarks in a compute and sample-efficient manner.


AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

August 2023

·

230 Reads

StarCraft II is one of the most challenging simulated reinforcement learning environments; it is partially observable, stochastic, multi-agent, and mastering StarCraft II requires strategic planning over long time horizons with real-time low-level execution. It also has an active professional competitive scene. StarCraft II is uniquely suited for advancing offline RL algorithms, both because of its challenging nature and because Blizzard has released a massive dataset of millions of StarCraft II games played by human players. This paper leverages that and establishes a benchmark, called AlphaStar Unplugged, introducing unprecedented challenges for offline reinforcement learning. We define a dataset (a subset of Blizzard's release), tools standardizing an API for machine learning methods, and an evaluation protocol. We also present baseline agents, including behavior cloning, offline variants of actor-critic and MuZero. We improve the state of the art of agents using only offline data, and we achieve 90% win rate against previously published AlphaStar behavior cloning agent.


Knowledge Transfer from Teachers to Learners in Growing-Batch Reinforcement Learning

May 2023

·

59 Reads

Standard approaches to sequential decision-making exploit an agent's ability to continually interact with its environment and improve its control policy. However, due to safety, ethical, and practicality constraints, this type of trial-and-error experimentation is often infeasible in many real-world domains such as healthcare and robotics. Instead, control policies in these domains are typically trained offline from previously logged data or in a growing-batch manner. In this setting a fixed policy is deployed to the environment and used to gather an entire batch of new data before being aggregated with past batches and used to update the policy. This improvement cycle can then be repeated multiple times. While a limited number of such cycles is feasible in real-world domains, the quantity and diversity of the resulting data are much lower than in the standard continually-interacting approach. However, data collection in these domains is often performed in conjunction with human experts, who are able to label or annotate the collected data. In this paper, we first explore the trade-offs present in this growing-batch setting, and then investigate how information provided by a teacher (i.e., demonstrations, expert actions, and gradient information) can be leveraged at training time to mitigate the sample complexity and coverage requirements for actor-critic methods. We validate our contributions on tasks from the DeepMind Control Suite.


Competition-level code generation with AlphaCode

December 2022

·

218 Reads

·

812 Citations

Science

Programming is a powerful and ubiquitous problem-solving tool. Systems that can assist programmers or even generate programs themselves could make programming more productive and accessible. Recent transformer-based neural network models show impressive code generation abilities yet still perform poorly on more complex tasks requiring problem-solving skills, such as competitive programming problems. Here, we introduce AlphaCode, a system for code generation that achieved an average ranking in the top 54.3% in simulated evaluations on recent programming competitions on the Codeforces platform. AlphaCode solves problems by generating millions of diverse programs using specially trained transformer-based networks and then filtering and clustering those programs to a maximum of just 10 submissions. This result marks the first time an artificial intelligence system has performed competitively in programming competitions.


Multi-step Planning for Automated Hyperparameter Optimization with OptFormer

October 2022

·

3 Reads

As machine learning permeates more industries and models become more expensive and time consuming to train, the need for efficient automated hyperparameter optimization (HPO) has never been more pressing. Multi-step planning based approaches to hyperparameter optimization promise improved efficiency over myopic alternatives by more effectively balancing out exploration and exploitation. However, the potential of these approaches has not been fully realized due to their technical complexity and computational intensity. In this work, we leverage recent advances in Transformer-based, natural-language-interfaced hyperparameter optimization to circumvent these barriers. We build on top of the recently proposed OptFormer which casts both hyperparameter suggestion and target function approximation as autoregressive generation thus making planning via rollouts simple and efficient. We conduct extensive exploration of different strategies for performing multi-step planning on top of the OptFormer model to highlight its potential for use in constructing non-myopic HPO strategies.


Figure 13: Fitness of predicted CDF(y) on HPO-B test set.
Figure 17: Ablation on the choice of acquisition functions. The plot shows the best normalized function values averaged over HPO-B test functions. Ablation curves are shown with markers.
Towards Learning Universal Hyperparameter Optimizers with Transformers

May 2022

·

51 Reads

·

2 Citations

Meta-learning hyperparameter optimization (HPO) algorithms from prior experiments is a promising approach to improve optimization efficiency over objective functions from a similar distribution. However, existing methods are restricted to learning from experiments sharing the same set of hyperparameters. In this paper, we introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction when trained on vast tuning data from the wild. Our extensive experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates. Compared to a Gaussian Process, the OptFormer also learns a robust prior distribution for hyperparameter response functions, and can thereby provide more accurate and better calibrated predictions. This work paves the path to future extensions for training a Transformer-based model as a general HPO optimizer.


A Generalist Agent

May 2022

·

730 Reads

·

7 Citations

Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report we describe the model and the data, and document the current capabilities of Gato.


Restoring and attributing ancient texts using deep neural networks

March 2022

·

1,370 Reads

·

137 Citations

Nature

Ancient history relies on disciplines such as epigraphy—the study of inscribed texts known as inscriptions—for evidence of the thought, language, society and history of past civilizations1. However, over the centuries, many inscriptions have been damaged to the point of illegibility, transported far from their original location and their date of writing is steeped in uncertainty. Here we present Ithaca, a deep neural network for the textual restoration, geographical attribution and chronological attribution of ancient Greek inscriptions. Ithaca is designed to assist and expand the historian’s workflow. The architecture of Ithaca focuses on collaboration, decision support and interpretability. While Ithaca alone achieves 62% accuracy when restoring damaged texts, the use of Ithaca by historians improved their accuracy from 25% to 72%, confirming the synergistic effect of this research tool. Ithaca can attribute inscriptions to their original location with an accuracy of 71% and can date them to less than 30 years of their ground-truth ranges, redating key texts of Classical Athens and contributing to topical debates in ancient history. This research shows how models such as Ithaca can unlock the cooperative potential between artificial intelligence and historians, transformationally impacting the way that we study and write about one of the most important periods in human history. Ithaca—a deep neural network for textual restoration, geographical attribution and dating of ancient Greek inscriptions—collaboratively aids historians’ study of damaged texts.


Competition-Level Code Generation with AlphaCode

February 2022

·

1,467 Reads

·

12 Citations

Programming is a powerful and ubiquitous problem-solving tool. Developing systems that can assist programmers or even generate programs independently could make programming more productive and accessible, yet so far incorporating innovations in AI has proven challenging. Recent large-scale language models have demonstrated an impressive ability to generate code, and are now able to complete simple programming tasks. However, these models still perform poorly when evaluated on more complex, unseen problems that require problem-solving skills beyond simply translating instructions into code. For example, competitive programming problems which require an understanding of algorithms and complex natural language remain extremely challenging. To address this gap, we introduce AlphaCode, a system for code generation that can create novel solutions to these problems that require deeper reasoning. In simulated evaluations on recent programming competitions on the Codeforces platform, AlphaCode achieved on average a ranking of top 54.3% in competitions with more than 5,000 participants. We found that three key components were critical to achieve good and reliable performance: (1) an extensive and clean competitive programming dataset for training and evaluation, (2) large and efficient-to-sample transformer-based architectures, and (3) large-scale model sampling to explore the search space, followed by filtering based on program behavior to a small set of submissions.


Citations (69)


... However, the confidence loss focuses only on reinforcing positive behavior from frozen weak teachers, and ignores the benefit of iteratively improving the quality of positive behavior [46,71] and penalizing negative behavior [61,73]. In addition, self-alignment methods have recently been viewed as promising approaches to address weak-to-strong alignment; such methods iteratively use self-generated data for aligning strong students rather than noisy supervision generated by weak teachers [23,71,70]. However, LLMs are prone to collapse when continuously reinforced on self-generated familiar positive behavior [56,69]. ...

Reference:

MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization
Reinforced Self-Training (ReST) for Language Modeling

... With the advancement of artificial intelligence and natural language processing technologies, efforts to protect low-resource languages have gained new momentum [53,54,57,94]. Large language models and computer vision technologies can be used to identify, translate, and interpret low-resource language literature, providing powerful tools for historical research [89]. The application of these technologies not only improves research efficiency but also expands the depth and scope of studies, allowing a broader range of low-resource language literature to be fully utilized. ...

Machine Learning for Ancient Languages: A Survey

Computational Linguistics

... Decoderbased models such as CodeGPT [25], Code Llama [6] and CodeGeeX [26] aim to predict the next token one by one (i.e., in an autoregressive manner) by considering all previous contextual tokens from left to right. Encoder-decoderbased models, such as CodeT5 [3], AlphaCode [27] and StarCoder [5], jointly train encoder and decoder networks for multiple tasks such as encoding a token as vector embedding or predicting subsequent tokens from the embeddings. ...

Competition-level code generation with AlphaCode
  • Citing Article
  • December 2022

Science

... Accomplishing such a task greatly improves programming experience and efficiency. State-of-the-art LMs achieve remarkable performance on program synthesis [8], [12], [13], [30], [32], [40], [54], leading to useful and popular code completion engines, such as GitHub Copilot [12]. ...

Competition-Level Code Generation with AlphaCode

... Finally, adaptive rewiring could be applied to deep neural networks (DNNs) as a neural architecture search method (Zoph et al., 2018;Zoph and Le, 2017). Recent advances of generative DNNs, such as ChatGPT and Sora (OpenAI, 2023;OpenAI, 2024), are impressive in their performance, but despite their remarkable achievements, even state-of-the-art DNNs still lag behind humans in many cognitive tasks (Goertzel, 2023;Maus et al., 2023;Ortega et al., 2021). Moreover, the energy consumption of these large models (Luccioni et al., 2023;Xu et al., 2024) is in stark contrast to the energy efficiency of the human brain (Balasubramanian, 2021). ...

Shaking the foundations: delusions in sequence models for interaction and control
  • Citing Preprint
  • October 2021

... In any case, there is a question mark over whether mouth articulations would look physiologically natural and synchronously convincing in terms of their belonging to the character's face (i.e., visual or corporal synchronisation). It may be for these reasons that such technologies are currently limited to head-on shots (see Yang et al., 2020) in non-fiction content. Nevertheless, it seems unlikely that traditional dubbing will be done away with for fiction very soon, and this is echoed by universities (e.g., the University of Bristol and University College London), dubbing studios (e.g., vsi London), and localisation companies (e.g., zoo Digital), providing-whether internally and/or through partnerships or invited speakers-workshops and courses on script adaptation for English dubbing, which deal with lip-sync among other core concepts within the practice. ...

Large-scale multilingual audio visual dubbing
  • Citing Preprint
  • November 2020

... However, the Sim-to-Real gap significantly impacts the manipulation accuracy of imitation learning policies. As a result, some research shifts towards directly collecting real-world data, including datasets gathered through automated scripts or expert agents [8,18,32,39,46,67], as well as those obtained via human teleoperation [6,7,22,25,37,58,75,81]. As shown in Table I, we compare RoboMIND with existing real-world datasets for robot manipulation. ...

Scaling data-driven robotics with reward sketching and batch reinforcement learning
  • Citing Conference Paper
  • July 2020

... The excellent generalizability and cost-efficiency of our framework are primarily a result of the high-throughput capacity enabled by the pioneering architecture of HDRL-FP. This framework is fundamentally different from the recent development of other parallel RL architectures [17][18][19][20][21] , but facilitates the fast running of thousands of concurrent RL simulations on a single graphics processing unit (GPU) 22 . The massive number of parallel agents operated by HDRL-FP diversifies the exploration of environment into numerous uncorrelated regions, resulting in significant improvement of the training stability and a dramatic reduction of runtime in our generalizable RL environment for chemical reactions. ...

Acme: A Research Framework for Distributed Reinforcement Learning
  • Citing Preprint
  • June 2020