Una-May O’Reilly

Una-May O’Reilly
Massachusetts Institute of Technology | MIT · MIT Computer Science and Artificial Intelligence Laboratory

About

233
Publications
35,311
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,836
Citations
Introduction

Publications

Publications (233)
Article
Cross-linked threat, vulnerability, and defensive mitigation knowledge is critical in defending against diverse and dynamic cyber threats. Cyber analysts consult it by deductively or inductively creating a chain of reasoning to identify a threat starting from indicators they observe, or vice versa . Cyber hunters use it abductively to reason when h...
Preprint
Full-text available
Communication strongly influences attitudes on climate change. Within sponsored communication, high spend and high reach advertising dominates. In the advertising ecosystem we can distinguish actors with adversarial stances: organizations with contrarian or advocacy communication goals, who direct the advertisement delivery algorithm to launch ads...
Preprint
Full-text available
Our goal is to identify brain regions involved in comprehending computer programs. We use functional magnetic resonance imaging (fMRI) to investigate two candidate systems of brain regions which may support this -- the Multiple Demand (MD) system, known to respond to a range of cognitively demanding tasks, and the Language system (LS), known to pri...
Chapter
We present a GUI-driven and efficient Genetic Programming (GP) and AI Planning framework designed for agent-based learning research. Our framework, ABL-Unity3D, is built in Unity3D, a game development environment. ABL-Unity3D addresses challenges entailed in co-opting Unity3D: making the simulator serve agent learning rather than humans playing a g...
Article
We investigate artificial intelligence and machine learning methods for optimizing the adversarial behavior of agents in cybersecurity simulations. Our cybersecurity simulations integrate the modeling of agents launching Advanced Persistent Threats (APTs) with the modeling of agents using detection and mitigation mechanisms against APTs. This simul...
Preprint
Full-text available
We integrate contrastive learning (CL) with adversarial learning to co-optimize the robustness and accuracy of code models. Different from existing works, we show that code obfuscation, a standard code transformation operation, provides novel means to generate complementary `views' of a code that enable us to achieve both robust and accurate code m...
Article
Full-text available
Contagious respiratory diseases, such as COVID-19, depend on sufficiently prolonged exposures for the successful transmission of the underlying pathogen. It is important that organizations evaluate the efficacy of non-pharmaceutical interventions aimed at mitigating viral transmission among their personnel. We have developed a operational risk asse...
Chapter
Program synthesis automates the process of writing code, which can be a very useful tool in allowing people to better leverage computational resources. However, a limiting factor in the scalability of current program synthesis techniques is the large size of the search space, especially for complex programs. We present a new model for synthesizing...
Chapter
Human code is different from code generated by program search. We investigate if properties from human-generated code can guide program search to improve the qualities of the generated programs, e.g., readability and performance. Here we focus on program search with grammatical evolution, which produces code that has different structure compared to...
Preprint
Full-text available
Contagious respiratory diseases, such as COVID-19, depend on sufficiently prolonged exposures for the successful transmission of the underlying pathogen. It is important for organizations to evaluate the efficacy of interventions aiming at mitigating viral transmission among their personnel. We have developed a operational risk assessment simulatio...
Preprint
Full-text available
Artificial Intelligence (AI) and Machine Learning (ML) algorithms can support the span of indicator-level, e.g. anomaly detection, to behavioral level cyber security modeling and inference. This contribution is based on a dataset named BRON which is amalgamated from public threat and vulnerability behavioral sources. We demonstrate how BRON can sup...
Article
Generative Adversarial Networks (GANs) are difficult to train because of pathologies such as mode and discriminator collapse. Similar pathologies have been studied and addressed in competitive evolutionary computation by increased diversity. We study a system, Lipizzaner, that combines spatial coevolution with gradient-based learning to improve the...
Preprint
Full-text available
Generative adversary networks (GANs) suffer from training pathologies such as instability and mode collapse, which mainly arise from a lack of diversity in their adversarial interactions. Co-evolutionary GAN (CoE-GAN) training algorithms have shown to be resilient to these pathologies. This article introduces Mustangs, a spatially distributed CoE-G...
Preprint
Full-text available
Scaling the cyber hunt problem poses several key technical challenges. Detecting and characterizing cyber threats at scale in large enterprise networks is hard because of the vast quantity and complexity of the data that must be analyzed as adversaries deploy varied and evolving tactics to accomplish their goals. There is a great need to automate a...
Chapter
We explore how to give Genetic Programming (GP) a head start to synthesize a programming problem. Our method uses a related problem and introduces a schedule that directs GP to solve the related problem first either fully or to some extent first, or at the same time. In addition, if the related problem’s solutions are written by students or evolved...
Preprint
Full-text available
Machine learning (ML) models that learn and predict properties of computer programs are increasingly being adopted and deployed. These models have demonstrated success in applications such as auto-completing code, summarizing large programs, and detecting bugs and malware in programs. In this work, we investigate principled ways to adversarially pe...
Preprint
Full-text available
Generative adversarial networks (GANs) exhibit training pathologies that can lead to convergence-related degenerative behaviors, whereas spatially-distributed, coevolutionary algorithms (CEAs) for GAN training, e.g. Lipizzaner, are empirically robust to them. The robustness arises from diversity that occurs by training populations of generators and...
Article
Full-text available
Computer programming is a novel cognitive tool that has transformed modern society. What cognitive and neural mechanisms support this skill? Here, we used functional magnetic resonance imaging to investigate two candidate brain systems: the multiple demand (MD) system, typically recruited during math, logic, problem solving, and executive tasks, an...
Preprint
Many public sources of cyber threat and vulnerability information exist to serve the defense of cyber systems. This paper proposes BRON which is a composite of MITRE's ATT&CK MATRIX, NIST's Common Weakness Enumerations (CWE), Common Vulnerabilities and Exposures (CVE), and Common Attack Pattern Enumeration and Classification, CAPEC. BRON preserves...
Preprint
Adversarial examples are imperceptible perturbations in the input to a neural model that result in misclassification. Generating adversarial examples for source code poses an additional challenge compared to the domains of images and natural language, because source code perturbations must adhere to strict semantic guidelines so the resulting progr...
Preprint
Distributed coevolutionary Generative Adversarial Network (GAN) training has empirically shown success in overcoming GAN training pathologies. This is mainly due to diversity maintenance in the populations of generators and discriminators during the training process. The method studied here coevolves sub-populations on each cell of a spatial grid o...
Chapter
Distributed coevolutionary Generative Adversarial Network (GAN) training has empirically shown success in overcoming GAN training pathologies. This is mainly due to diversity maintenance in the populations of generators and discriminators during the training process. The method studied here coevolves sub-populations on each cell of a spatial grid o...
Chapter
The design of computational arms races can draw upon the compelling inspiration of biological arms races. To study cyber security attack-defense dynamics, we have abstracted a description of biological adversarial ecosystems to design an adversarial computational system. The system has elements and processes with abstracted biological analogs. It c...
Article
Full-text available
Cyber security adversaries and engagements are ubiquitous and ceaseless. We delineate Adversarial Genetic Programming for Cyber Security, a research topic that, by means of genetic programming (GP), replicates and studies the behavior of cyber adversaries and the dynamics of their engagements. Adversarial Genetic Programming for Cyber Security enco...
Chapter
We investigate training Generative Adversarial Networks, GANs, with less data. Subsets of the training dataset can express empirical sample diversity while reducing training resource requirements, e.g., time and memory. We ask how much data reduction impacts generator performance and gauge the additive value of generator ensembles. In addition to c...
Preprint
Full-text available
Computer programming is a novel cognitive tool that has transformed modern society. An integral part of programming is code comprehension: the ability to process individual program tokens, combine them into statements, which, in turn, combine to form a program. What cognitive and neural mechanisms support this ability to process computer code? Here...
Preprint
We investigate the problem of classifying a line of program as containing a vulnerability or not using machine learning. Such a line-level classification task calls for a program representation which goes beyond reasoning from the tokens present in the line. We seek a distributed representation in a latent feature space which can capture the contro...
Preprint
Full-text available
Cyber security adversaries and engagements are ubiquitous and ceaseless. We delineate Adversarial Genetic Programming for Cyber Security, a research topic that, by means of genetic programming (GP), replicates and studies the behavior of cyber adversaries and the dynamics of their engagements. Adversarial Genetic Programming for Cyber Security enco...
Preprint
We investigate training Generative Adversarial Networks, GANs, with less data. Subsets of the training dataset can express empirical sample diversity while reducing training resource requirements, e.g. time and memory. We ask how much data reduction impacts generator performance and gauge the additive value of generator ensembles. In addition to co...
Chapter
Cyber adversaries are immersed in a ceaseless arms race. Each adversary incessantly maneuvers to adapt to the opposing posture. An avenue to pro-active, adversarially-hardened cyber defenses can be investigated by studying the dynamics of these cyber engagements. An adversarial engagement can computationally act as an elementary component of a comp...
Conference Paper
Generative adversary networks (GANs) suffer from training pathologies such as instability and mode collapse. These pathologies mainly arise from a lack of diversity in their adversarial interactions. Evolutionary generative adversarial networks apply the principles of evolutionary computation to mitigate these problems. We hybridize two of these ap...
Conference Paper
Programmers solve coding problems with the support of both programming and problem specific knowledge. They integrate this domain knowledge to reason by computational abstraction. Correct and readable code arises from sound abstractions and problem solving. We attempt to transfer insights from such human expertise to genetic programming (GP) for so...
Conference Paper
MOOCs (Massive Open Online Courses) frequently use grades to calculate whether a student passes the course. To better understand how student behavior is influenced by grade feedback, we conduct a study on the changes of certified students' behavior before and after they have received their grade. We use observational student data from two MITx MOOC...
Conference Paper
In classrooms, instructors teaching students how to code have the ability to monitor progress and provide feedback through regular interaction. There is generally no analogous tracing of learning progression in programming MOOCs, hindering the ability of MOOC platforms to provide automated feedback at scale. We explore features for every certified...
Preprint
Full-text available
Generative adversary networks (GANs) suffer from training pathologies such as instability and mode collapse. These pathologies mainly arise from a lack of diversity in their adversarial interactions. Evolutionary generative adversarial networks apply the principles of evolutionary computation to mitigate these problems. We hybridize two of these ap...
Chapter
Low population diversity is recognized as a factor in premature convergence of evolutionary algorithms. We investigate program synthesis performance via grammatical evolution. We focus on novelty search – substituting the conventional search objective – based on synthesis quality, with a novelty objective. This prompts us to introduce a new selecti...
Conference Paper
In a Massive Open Online Course (MOOC), predictive models of student behavior can support multiple aspects of learning, including instructor feedback and timely intervention. Ongoing courses, when the student outcomes are yet unknown, must rely on models trained from the historical data of previously offered courses. It is possible to transfer mode...
Conference Paper
Student learning activity in MOOCs can be viewed from multiple perspectives. We present a new organization of MOOC learner activity data at a resolution that is in between the fine granularity of the clickstream and coarse organizations that count activities, aggregate students or use long duration time units. A detailed access trajectory (DAT) con...
Preprint
Full-text available
Machine learning models are vulnerable to adversarial examples. In this paper, we are concerned with black-box adversarial attacks, where only loss-oracle access to a model is available. At the heart of black-box adversarial attack is the gradient estimation problem with query complexity O(n), where n is the number of data features. Recent work has...
Conference Paper
Motivated by Danskin’s theorem, gradient-based methods have been applied with empirical success to solve minimax problems that involve non-convex outer minimization and non-concave inner maximization. On the other hand, recent work has demonstrated that Evolution Strategies (ES) algorithms are stochastic gradient approximators that seek robust solu...
Preprint
Student learning activity in MOOCs can be viewed from multiple perspectives. We present a new organization of MOOC learner activity data at a resolution that is in between the fine granularity of the clickstream and coarse organizations that count activities, aggregate students or use long duration time units. A detailed access trajectory (DAT) con...
Preprint
In MOOCs predictive models of student behavior support many aspects of learning design, including instructor feedback and timely intervention. Ongoing (target) courses, when their predictive outcomes are yet unknown, must rely upon models trained from the historical data of previously offered (source) courses. It is possible to transfer such models...
Preprint
Full-text available
GANs are difficult to train due to convergence pathologies such as mode and discriminator collapse. We introduce Lipizzaner, an open source software system that allows machine learning engineers to train GANs in a distributed and robust way. Lipizzaner distributes a competitive coevolutionary algorithm which, by virtue of dual, adapting, generator...
Preprint
Full-text available
Timely prediction of clinically critical events in Intensive Care Unit (ICU) is important for improving care and survival rate. Most of the existing approaches are based on the application of various classification methods on explicitly extracted statistical features from vital signals. In this work, we propose to eliminate the high cost of enginee...
Chapter
We investigate the application of a version of Genetic Programming with grammars, called Grammatical Evolution, and a multi-population competitive coevolutionary algorithm for anticipating tax evasion in the domain of U.S. Partnership tax regulations. A problem in tax auditing is that as soon as one evasion scheme is detected a new, slightly mutate...
Article
An accurate and timely prediction of clinically critical events in Intensive Care Unit (ICU) is important for improving care and survival rate. Most of the existing approaches are based on the application of various classification methods on explicitly extracted statistical features from the vital signals of ICU patients such as the mean, the stand...
Preprint
Full-text available
With the celebrated success of deep learning, some attempts to develop effective methods for detecting malicious PowerShell programs employ neural nets in a traditional natural language processing setup while others employ convolutional neural nets to detect obfuscated malicious commands at a character level. While these representations may express...