Joel Lehman’s research while affiliated with Providence College and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (54)


Evolution and The Knightian Blindspot of Machine Learning
  • Preprint

January 2025

·

12 Reads

Joel Lehman

·

Elliot Meyerson

·

Tarek El-Gaaly

·

[...]

·

Tarin Ziyaee

This paper claims that machine learning (ML) largely overlooks an important facet of general intelligence: robustness to a qualitatively unknown future in an open world. Such robustness relates to Knightian uncertainty (KU) in economics, i.e. uncertainty that cannot be quantified, which is excluded from consideration in ML's key formalisms. This paper aims to identify this blind spot, argue its importance, and catalyze research into addressing it, which we believe is necessary to create truly robust open-world AI. To help illuminate the blind spot, we contrast one area of ML, reinforcement learning (RL), with the process of biological evolution. Despite staggering ongoing progress, RL still struggles in open-world situations, often failing under unforeseen situations. For example, the idea of zero-shot transferring a self-driving car policy trained only in the US to the UK currently seems exceedingly ambitious. In dramatic contrast, biological evolution routinely produces agents that thrive within an open world, sometimes even to situations that are remarkably out-of-distribution (e.g. invasive species; or humans, who do undertake such zero-shot international driving). Interestingly, evolution achieves such robustness without explicit theory, formalisms, or mathematical gradients. We explore the assumptions underlying RL's typical formalisms, showing how they limit RL's engagement with the unknown unknowns characteristic of an ever-changing complex world. Further, we identify mechanisms through which evolutionary processes foster robustness to novel and unpredictable challenges, and discuss potential pathways to algorithmically embody them. The conclusion is that the intriguing remaining fragility of ML may result from blind spots in its formalisms, and that significant gains may result from direct confrontation with the challenge of KU.


Language Model Crossover: Variation through Few-Shot Prompting

September 2024

·

12 Reads

·

54 Citations

ACM Transactions on Evolutionary Learning and Optimization

This paper pursues the insight that language models naturally enable an intelligent variation operator similar in spirit to evolutionary crossover. In particular, language models of sufficient scale demonstrate in-context learning, i.e. they can learn from associations between a small number of input patterns to generate outputs incorporating such associations (also called few-shot prompting). This ability can be leveraged to form a simple but powerful variation operator, i.e. to prompt a language model with a few text-based genotypes (such as code, plain-text sentences, or equations), and to parse its corresponding output as those genotypes’ offspring. The promise of such language model crossover (which is simple to implement and can leverage many different open-source language models) is that it enables a simple mechanism to evolve semantically-rich text representations (with few domain-specific tweaks), and naturally benefits from current progress in language models. Experiments in this paper highlight the versatility of language-model crossover, through evolving binary bit-strings, sentences, equations, text-to-image prompts, and Python code. The conclusion is that language model crossover is a flexible and effective method for evolving genomes representable as text.


The OpenELM Library: Leveraging Progress in Language Models for Novel Evolutionary Algorithms

February 2024

·

12 Reads

·

17 Citations

In recent years, Large Language Models (LLMs) have rapidly progressed in their capabilities in natural language processing (NLP) tasks, which have interestingly grown in scope to include generating computer programs. Indeed, recent studies have demonstrated how LLMs can enable highly proficient genetic programming (GP) algorithms and novel evolutionary algorithms more broadly. Motivated by these opportunities, this paper introduces OpenELM, an open-source Python library for designing evolutionary algorithms that leverage LLMs to intelligently generate variation, as well as to assess fitness and measures of diversity. The library includes implementations of several variation operators, and is designed to accommodate those with limited compute resources, by enabling fast inference, being runnable through hosted notebooks (such as Google Colab), and allowing for API-based LLMs to be used instead of local models run on GPUs. Additionally, OpenELM includes a variety of domain implementations for easy experimentation and adaptation, including several GP domains. The hope is to help researchers easily develop new approaches and applications within the nascent and largely unexplored paradigm of evolutionary algorithms that leverage LLMs.


Evolution Through Large Models

January 2024

·

49 Reads

·

78 Citations

This chapter pursues the insight that large language models (LLMs) trained to generate code can vastly improve the effectiveness of mutation operators applied to programs in genetic programming (GP). Because such LLMs benefit from training data that includes sequential changes and modifications, they can approximate likely changes that humans would make. To highlight the breadth of implications of such evolution through large models (ELM), in the main experiment ELM combined with MAP-Elites generates hundreds of thousands of functional examples of Python programs that output working ambulating robots in the Sodarace domain, which the original LLM had never seen in pretraining. These examples then help to bootstrap training a new conditional language model that can output the right walker for a particular terrain. The ability to bootstrap new models that can output appropriate artifacts for a given context in a domain where zero training data was previously available carries implications for open-endedness, deep learning, and reinforcement learning. These implications are explored here in depth in the hope of inspiring new directions of research now opened up by ELM.


OMNI: Open-endedness via Models of human Notions of Interestingness
  • Preprint
  • File available

June 2023

·

92 Reads

Open-ended algorithms aim to learn new, interesting behaviors forever. That requires a vast environment search space, but there are thus infinitely many possible tasks. Even after filtering for tasks the current agent can learn (i.e., learning progress), countless learnable yet uninteresting tasks remain (e.g., minor variations of previously learned tasks). An Achilles Heel of open-endedness research is the inability to quantify (and thus prioritize) tasks that are not just learnable, but also interesting\textit{interesting} (e.g., worthwhile and novel). We propose solving this problem by Open-endedness via Models of human Notions of Interestingness\textit{Open-endedness via Models of human Notions of Interestingness} (OMNI). The insight is that we can utilize large (language) models (LMs) as a model of interestingness (MoI), because they already\textit{already} internalize human concepts of interestingness from training on vast amounts of human-generated data, where humans naturally write about what they find interesting or boring. We show that LM-based MoIs improve open-ended learning by focusing on tasks that are both learnable and interesting\textit{and interesting}, outperforming baselines based on uniform task sampling or learning progress alone. This approach has the potential to dramatically advance the ability to intelligently select which tasks to focus on next (i.e., auto-curricula), and could be seen as AI selecting its own next task to learn, facilitating self-improving AI and AI-Generating Algorithms.

Download

Language Model Crossover: Variation through Few-Shot Prompting

February 2023

·

67 Reads

·

2 Citations

This paper pursues the insight that language models naturally enable an intelligent variation operator similar in spirit to evolutionary crossover. In particular, language models of sufficient scale demonstrate in-context learning, i.e. they can learn from associations between a small number of input patterns to generate outputs incorporating such associations (also called few-shot prompting). This ability can be leveraged to form a simple but powerful variation operator, i.e. to prompt a language model with a few text-based genotypes (such as code, plain-text sentences, or equations), and to parse its corresponding output as those genotypes' offspring. The promise of such language model crossover (which is simple to implement and can leverage many different open-source language models) is that it enables a simple mechanism to evolve semantically-rich text representations (with few domain-specific tweaks), and naturally benefits from current progress in language models. Experiments in this paper highlight the versatility of language-model crossover, through evolving binary bit-strings, sentences, equations, text-to-image prompts, and Python code. The conclusion is that language model crossover is a promising method for evolving genomes representable as text.


Machine Love

February 2023

·

341 Reads

·

1 Citation

While ML generates much economic value, many of us have problematic relationships with social media and other ML-powered applications. One reason is that ML often optimizes for what we want in the moment, which is easy to quantify but at odds with what is known scientifically about human flourishing. Thus, through its impoverished models of us, ML currently falls far short of its exciting potential, which is for it to help us to reach ours. While there is no consensus on defining human flourishing, from diverse perspectives across psychology, philosophy, and spiritual traditions, love is understood to be one of its primary catalysts. Motivated by this view, this paper explores whether there is a useful conception of love fitting for machines to embody, as historically it has been generative to explore whether a nebulous concept, such as life or intelligence, can be thoughtfully abstracted and reimagined, as in the fields of machine intelligence or artificial life. This paper forwards a candidate conception of machine love, inspired in particular by work in positive psychology and psychotherapy: to provide unconditional support enabling humans to autonomously pursue their own growth and development. Through proof of concept experiments, this paper aims to highlight the need for richer models of human flourishing in ML, provide an example framework through which positive psychology can be combined with ML to realize a rough conception of machine love, and demonstrate that current language models begin to enable embodying qualitative humanistic principles. The conclusion is that though at present ML may often serve to addict, distract, or divide us, an alternative path may be opening up: We may align ML to support our growth, through it helping us to align ourselves towards our highest aspirations.


Evolution through Large Models

June 2022

·

74 Reads

·

6 Citations

This paper pursues the insight that large language models (LLMs) trained to generate code can vastly improve the effectiveness of mutation operators applied to programs in genetic programming (GP). Because such LLMs benefit from training data that includes sequential changes and modifications, they can approximate likely changes that humans would make. To highlight the breadth of implications of such evolution through large models (ELM), in the main experiment ELM combined with MAP-Elites generates hundreds of thousands of functional examples of Python programs that output working ambulating robots in the Sodarace domain, which the original LLM had never seen in pre-training. These examples then help to bootstrap training a new conditional language model that can output the right walker for a particular terrain. The ability to bootstrap new models that can output appropriate artifacts for a given context in a domain where zero training data was previously available carries implications for open-endedness, deep learning, and reinforcement learning. These implications are explored here in depth in the hope of inspiring new directions of research now opened up by ELM.


Neural network architectures
a, The Atari architecture is based on the architecture provided with the backward algorithm implementation. The input consists of the RGB channels of the last four frames (rescaled to 80 by 105 pixels) concatenated, resulting in 12 input channels. The network consists of three convolutional layers (C), two fully connected layers (FC), and a layer of gated recurrent units (GRUs)⁶⁸. The network has a policy head πt(st|at) and a value head Vt(st). b, For the robotics problem, the architecture consists of two separate networks, each with two fully connected layers and a GRU layer. One network specifies the policy πt(st|at) by returning a mean μt and variance σt for the actuator torques of the arm and the desired position of each of the two fingers of the gripper (gripper fingers are implemented as Mujoco position actuators⁶¹ with kp = 10⁴ and a control range of [0, 0.05]). The other network implements the value function Vt(st). c, The architecture for policy-based Go-Explore is identical to the Atari architecture, except that the goal representation gt is concatenated with the input of the first fully connected layer. Activation functions (Act.) are: the rectified-linear unit (Relu), the exponential function (Exp) and the softmax function (Softmax). Layers can also include layer normalization (Layer norm), which transforms the output of the layer by subtracting the mean and dividing by the standard deviation of the layer.
Maximum end-of-episode score found by the exploration phase on Atari
a, Exploration phase without domain knowledge. b, Exploration phase with domain knowledge, compared to downscaled. Because only scores achieved at the episode end are reported, the plots for some games (for example, Solaris) begin after the start of the run, when the episode end is first reached. In a, averaging is over 50 runs for the 11 focus games and five runs for other games. In b, averaging is over 100 runs. Shaded areas show 95% bootstrap CIs of the mean with 1,000 samples. Avg. Human, average human performance; SOTA, state-of-the-art performance; M, ×10⁶; K, ×10³.
Number of cells in archive during the exploration phase on Atari
a, Exploration phase without domain knowledge. b, Exploration phase with domain knowledge. In a, archive size can decrease when the representation is recomputed. Previous archives are converted to the new format when the representation is recomputed, possibly leading to an archive with a size larger than 50,000. In this case, one iteration of the exploration phase runs and the representation is recomputed again. In a, averaging is over 50 runs for the 11 focus games and five runs for other games. In b, averaging is over 100 runs. Shaded areas show 95% bootstrap CIs of the mean with 1,000 samples.
Progress of robustification phase on Atari
a, Exploration phase without domain knowledge. b, Exploration phase with domain knowledge. Shown are the scores achieved by robustifying agents across training time for the exploration phase without domain-knowledge representations (a) and with representations informed by domain knowledge (b). In particular, the rolling mean is shown for performance across the past 100 episodes when starting from the virtual demonstration (which corresponds to the domain’s traditional starting state). Note that in a, averaging is over five independent runs, whereas in b, averaging is over 10 runs. Because the final performance is obtained by testing the highest-performing network checkpoint for each run over 1,000 additional episodes, rather than directly extracted from the curves above, the performance reported in Fig. 2b does not necessarily match any particular point along these curves (Methods). Shaded areas show 95% bootstrap CIs of the mean with 1,000 samples.
Progress of the exploration phase in the robotics environment
a, Runs with successful trajectories. b, Length of the shortest successful trajectory. In a, the exploration phase quickly achieves 100% success rate for all shelves in the robotics environment. However, b shows that although success is achieved quickly it is useful to keep the exploration phase running longer to reduce the length of the successful trajectories, thus making robustification easier. Lines show the mean over 50 runs. Shaded areas show 95% bootstrap CIs of the mean with 1,000 samples.

+7

First return, then explore

February 2021

·

868 Reads

·

338 Citations

Nature

Reinforcement learning promises to solve complex sequential-decision problems autonomously by specifying a high-level reward function only. However, reinforcement learning algorithms struggle when, as is often the case, simple and intuitive rewards provide sparse1 and deceptive2 feedback. Avoiding these pitfalls requires a thorough exploration of the environment, but creating algorithms that can do so remains one of the central challenges of the field. Here we hypothesize that the main impediment to effective exploration originates from algorithms forgetting how to reach previously visited states (detachment) and failing to first return to a state before exploring from it (derailment). We introduce Go-Explore, a family of algorithms that addresses these two challenges directly through the simple principles of explicitly ‘remembering’ promising states and returning to such states before intentionally exploring. Go-Explore solves all previously unsolved Atari games and surpasses the state of the art on all hard-exploration games1, with orders-of-magnitude improvements on the grand challenges of Montezuma’s Revenge and Pitfall. We also demonstrate the practical potential of Go-Explore on a sparse-reward pick-and-place robotics task. Additionally, we show that adding a goal-conditioned policy can further improve Go-Explore’s exploration efficiency and enable it to handle stochasticity throughout training. The substantial performance gains from Go-Explore suggest that the simple principles of remembering states, returning to them, and exploring from them are a powerful and general approach to exploration—an insight that may prove critical to the creation of truly intelligent learning agents. A reinforcement learning algorithm that explicitly remembers promising states and returns to them as a basis for further exploration solves all as-yet-unsolved Atari games and out-performs previous algorithms on Montezuma’s Revenge and Pitfall.


Open Questions in Creating Safe Open-ended AI: Tensions Between Control and Creativity

June 2020

·

52 Reads

Artificial life originated and has long studied the topic of open-ended evolution, which seeks the principles underlying artificial systems that innovate continually, inspired by biological evolution. Recently, interest has grown within the broader field of AI in a generalization of open-ended evolution, here called open-ended search, wherein such questions of open-endedness are explored for advancing AI, whatever the nature of the underlying search algorithm (e.g. evolutionary or gradient-based). For example, open-ended search might design new architectures for neural networks, new reinforcement learning algorithms, or most ambitiously, aim at designing artificial general intelligence. This paper proposes that open-ended evolution and artificial life have much to contribute towards the understanding of open-ended AI, focusing here in particular on the safety of open-ended search. The idea is that AI systems are increasingly applied in the real world, often producing unintended harms in the process, which motivates the growing field of AI safety. This paper argues that open-ended AI has its own safety challenges, in particular, whether the creativity of open-ended systems can be productively and predictably controlled. This paper explains how unique safety problems manifest in open-ended search, and suggests concrete contributions and research questions to explore them. The hope is to inspire progress towards creative, useful, and safe open-ended search algorithms.


Citations (26)


... Our study can be roughly classified into the first category. In [42], LLMs are employed as crossover operators to derive new solutions from parental inputs. Brownlee [43] also presented LLMs effectively functioning as mutation operators that enhance the search process. ...

Reference:

Visual Evolutionary Optimization on Combinatorial Problems with Multimodal Large Language Models: A Case Study of Influence Maximization
Language Model Crossover: Variation through Few-Shot Prompting
  • Citing Article
  • September 2024

ACM Transactions on Evolutionary Learning and Optimization

... The integration of LLMs with evolutionary algorithms has been extensively explored to enhance code generation. For example, Bradley et al. [35] introduced OpenELM, an opensource Python library that designs specialized evolutionary operators for code generation. Chen et al. [36] utilized LLMs as adaptive mutation and crossover operators in evolutionary neural architecture search (NAS), optimizing neural network design by merging LLM capabilities with evolutionary processes. ...

The OpenELM Library: Leveraging Progress in Language Models for Novel Evolutionary Algorithms
  • Citing Chapter
  • February 2024

... Various optimization approaches leveraging Large Language Models (LLMs) have emerged to address diverse BBO problem domains (Romera-Paredes et al. 2023;Meyerson et al. 2023;Liu et al. 2023;Yang et al. 2023;Lehman et al. 2023;Ma et al. 2023;Chen, Dohan, and So 2023;Nasir et al. 2023;Mo et al. 2024). These methods lack the universal applicability as pretrained BBO models due to a deficiency in generating capabilities across tasks. ...

Evolution Through Large Models
  • Citing Chapter
  • January 2024

... Such methods could improve performance in tasks like long-text generation and question answering by ensuring that the model respects the temporal and contextual order of information. Conversely, in existing Transformer-enhanced EAs [ 31 ], fitness values are directly utilized for position encoding [31,33,41,69,70] within the Transformer model. According to our conceptual analogy, the introduction of fitness shaping has the potential to aid Transformer-enhanced EAs in managing selective pressures. ...

Language Model Crossover: Variation through Few-Shot Prompting
  • Citing Preprint
  • February 2023

... Evolution through Large Models (ELM) [Lehman et al. 2022] proposes to use a language model in place of a mutation operator within a traditional genetic programming framework [Koza 1994]. They use a type of instruction fine-tuned model trained on git commit messages known as a diff model. ...

Evolution through Large Models
  • Citing Preprint
  • June 2022

... Intrinsic-reward-based exploration faces a fundamental limitation of vanishing intrinsic rewards [12,30]: as an agent explores the environment and becomes familiar with some local areas after a number of steps, the agent loses the exploration bonus and is unable to return to novel areas. As a result, the policy it learns is driven by extrinsic rewards only. ...

First return, then explore

Nature

... While I have attempted to introduce rewards logically, these sub-task rewards may have implicitly biased the results. Certainly, reinforcement learning agents can be quite creative in finding policies that maximize rewards, but distort their intended goal (Lehman et al., 2020). Because of this, there is room for exploration in shaping the reward function in future studies. ...

The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities
  • Citing Article
  • January 2020

... Broader boundary in AI Openness: In prior sections, we include within AI openness previous AI articles that use the terms "open source," "open," or "openness." However, the term "openended" is also found in the literature, describing a machine learning model capable of generating sequences that are both novel and learnable by an observer [40,61]. We have chosen to separate this concept from the rest of AI openness because its definition is distinct from other openness-related terms in AI. ...

Open Questions in Creating Safe Open-ended AI: Tensions Between Control and Creativity
  • Citing Conference Paper
  • January 2020

... To further understand how biologically plausible mechanisms may shed light on DNNs and optimization methods, implementations at the dendritic, single-neuron, or microcircuitry levels have been increasingly realized. [1][2][3][4][5] Thus far, deep learning and neurorobotics studies [6][7][8][9][10][11][12][13] have examined whether biological neuromodulation may lead to behavioral benefits. In these studies, neuromodulation was commonly defined as a mechanism that self-reconfigures network hyperparameters and connectivity based on environmental or/and behavioral states of the neural network (Table 1). ...

Learning to Continually Learn
  • Citing Preprint
  • February 2020

... Unfortunately, so far only few model zoos with specific properties have been published [44,144,155,158]. While many machine learning domains have standardized datasets, there is no model zoo nor a benchmark to evaluate and compare against. ...

An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents
  • Citing Conference Paper
  • August 2019