About
233
Publications
35,311
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,836
Citations
Introduction
Skills and Expertise
Publications
Publications (233)
Cross-linked threat, vulnerability, and defensive mitigation knowledge is critical in defending against diverse and dynamic cyber threats. Cyber analysts consult it by deductively or inductively creating a chain of reasoning to identify a threat starting from indicators they observe, or vice versa . Cyber hunters use it abductively to reason when h...
Communication strongly influences attitudes on climate change. Within sponsored communication, high spend and high reach advertising dominates. In the advertising ecosystem we can distinguish actors with adversarial stances: organizations with contrarian or advocacy communication goals, who direct the advertisement delivery algorithm to launch ads...
Our goal is to identify brain regions involved in comprehending computer programs. We use functional magnetic resonance imaging (fMRI) to investigate two candidate systems of brain regions which may support this -- the Multiple Demand (MD) system, known to respond to a range of cognitively demanding tasks, and the Language system (LS), known to pri...
We present a GUI-driven and efficient Genetic Programming (GP) and AI Planning framework designed for agent-based learning research. Our framework, ABL-Unity3D, is built in Unity3D, a game development environment. ABL-Unity3D addresses challenges entailed in co-opting Unity3D: making the simulator serve agent learning rather than humans playing a g...
We investigate artificial intelligence and machine learning methods for optimizing the adversarial behavior of agents in cybersecurity simulations. Our cybersecurity simulations integrate the modeling of agents launching Advanced Persistent Threats (APTs) with the modeling of agents using detection and mitigation mechanisms against APTs. This simul...
We integrate contrastive learning (CL) with adversarial learning to co-optimize the robustness and accuracy of code models. Different from existing works, we show that code obfuscation, a standard code transformation operation, provides novel means to generate complementary `views' of a code that enable us to achieve both robust and accurate code m...
Contagious respiratory diseases, such as COVID-19, depend on sufficiently prolonged exposures for the successful transmission of the underlying pathogen. It is important that organizations evaluate the efficacy of non-pharmaceutical interventions aimed at mitigating viral transmission among their personnel. We have developed a operational risk asse...
Program synthesis automates the process of writing code, which can be a very useful tool in allowing people to better leverage computational resources. However, a limiting factor in the scalability of current program synthesis techniques is the large size of the search space, especially for complex programs. We present a new model for synthesizing...
Human code is different from code generated by program search. We investigate if properties from human-generated code can guide program search to improve the qualities of the generated programs, e.g., readability and performance. Here we focus on program search with grammatical evolution, which produces code that has different structure compared to...
Contagious respiratory diseases, such as COVID-19, depend on sufficiently prolonged exposures for the successful transmission of the underlying pathogen. It is important for organizations to evaluate the efficacy of interventions aiming at mitigating viral transmission among their personnel. We have developed a operational risk assessment simulatio...
Artificial Intelligence (AI) and Machine Learning (ML) algorithms can support the span of indicator-level, e.g. anomaly detection, to behavioral level cyber security modeling and inference. This contribution is based on a dataset named BRON which is amalgamated from public threat and vulnerability behavioral sources. We demonstrate how BRON can sup...
Generative Adversarial Networks (GANs) are difficult to train because of pathologies such as mode and discriminator collapse. Similar pathologies have been studied and addressed in competitive evolutionary computation by increased diversity. We study a system, Lipizzaner, that combines spatial coevolution with gradient-based learning to improve the...
Generative adversary networks (GANs) suffer from training pathologies such as instability and mode collapse, which mainly arise from a lack of diversity in their adversarial interactions. Co-evolutionary GAN (CoE-GAN) training algorithms have shown to be resilient to these pathologies. This article introduces Mustangs, a spatially distributed CoE-G...
Scaling the cyber hunt problem poses several key technical challenges. Detecting and characterizing cyber threats at scale in large enterprise networks is hard because of the vast quantity and complexity of the data that must be analyzed as adversaries deploy varied and evolving tactics to accomplish their goals. There is a great need to automate a...
We explore how to give Genetic Programming (GP) a head start to synthesize a programming problem. Our method uses a related problem and introduces a schedule that directs GP to solve the related problem first either fully or to some extent first, or at the same time. In addition, if the related problem’s solutions are written by students or evolved...
Machine learning (ML) models that learn and predict properties of computer programs are increasingly being adopted and deployed. These models have demonstrated success in applications such as auto-completing code, summarizing large programs, and detecting bugs and malware in programs. In this work, we investigate principled ways to adversarially pe...
Generative adversarial networks (GANs) exhibit training pathologies that can lead to convergence-related degenerative behaviors, whereas spatially-distributed, coevolutionary algorithms (CEAs) for GAN training, e.g. Lipizzaner, are empirically robust to them. The robustness arises from diversity that occurs by training populations of generators and...
Computer programming is a novel cognitive tool that has transformed modern society. What cognitive and neural mechanisms support this skill? Here, we used functional magnetic resonance imaging to investigate two candidate brain systems: the multiple demand (MD) system, typically recruited during math, logic, problem solving, and executive tasks, an...
Many public sources of cyber threat and vulnerability information exist to serve the defense of cyber systems. This paper proposes BRON which is a composite of MITRE's ATT&CK MATRIX, NIST's Common Weakness Enumerations (CWE), Common Vulnerabilities and Exposures (CVE), and Common Attack Pattern Enumeration and Classification, CAPEC. BRON preserves...
Adversarial examples are imperceptible perturbations in the input to a neural model that result in misclassification. Generating adversarial examples for source code poses an additional challenge compared to the domains of images and natural language, because source code perturbations must adhere to strict semantic guidelines so the resulting progr...
Distributed coevolutionary Generative Adversarial Network (GAN) training has empirically shown success in overcoming GAN training pathologies. This is mainly due to diversity maintenance in the populations of generators and discriminators during the training process. The method studied here coevolves sub-populations on each cell of a spatial grid o...
Distributed coevolutionary Generative Adversarial Network (GAN) training has empirically shown success in overcoming GAN training pathologies. This is mainly due to diversity maintenance in the populations of generators and discriminators during the training process. The method studied here coevolves sub-populations on each cell of a spatial grid o...
The design of computational arms races can draw upon the compelling inspiration of biological arms races. To study cyber security attack-defense dynamics, we have abstracted a description of biological adversarial ecosystems to design an adversarial computational system. The system has elements and processes with abstracted biological analogs. It c...
Cyber security adversaries and engagements are ubiquitous and ceaseless. We delineate Adversarial Genetic Programming for Cyber Security, a research topic that, by means of genetic programming (GP), replicates and studies the behavior of cyber adversaries and the dynamics of their engagements. Adversarial Genetic Programming for Cyber Security enco...
We investigate training Generative Adversarial Networks, GANs, with less data. Subsets of the training dataset can express empirical sample diversity while reducing training resource requirements, e.g., time and memory. We ask how much data reduction impacts generator performance and gauge the additive value of generator ensembles. In addition to c...
Computer programming is a novel cognitive tool that has transformed modern society. An integral part of programming is code comprehension: the ability to process individual program tokens, combine them into statements, which, in turn, combine to form a program. What cognitive and neural mechanisms support this ability to process computer code? Here...
We investigate the problem of classifying a line of program as containing a vulnerability or not using machine learning. Such a line-level classification task calls for a program representation which goes beyond reasoning from the tokens present in the line. We seek a distributed representation in a latent feature space which can capture the contro...
Cyber security adversaries and engagements are ubiquitous and ceaseless. We delineate Adversarial Genetic Programming for Cyber Security, a research topic that, by means of genetic programming (GP), replicates and studies the behavior of cyber adversaries and the dynamics of their engagements. Adversarial Genetic Programming for Cyber Security enco...
We investigate training Generative Adversarial Networks, GANs, with less data. Subsets of the training dataset can express empirical sample diversity while reducing training resource requirements, e.g. time and memory. We ask how much data reduction impacts generator performance and gauge the additive value of generator ensembles. In addition to co...
Cyber adversaries are immersed in a ceaseless arms race. Each adversary incessantly maneuvers to adapt to the opposing posture. An avenue to pro-active, adversarially-hardened cyber defenses can be investigated by studying the dynamics of these cyber engagements. An adversarial engagement can computationally act as an elementary component of a comp...
Generative adversary networks (GANs) suffer from training pathologies such as instability and mode collapse. These pathologies mainly arise from a lack of diversity in their adversarial interactions. Evolutionary generative adversarial networks apply the principles of evolutionary computation to mitigate these problems. We hybridize two of these ap...
Programmers solve coding problems with the support of both programming and problem specific knowledge. They integrate this domain knowledge to reason by computational abstraction. Correct and readable code arises from sound abstractions and problem solving. We attempt to transfer insights from such human expertise to genetic programming (GP) for so...
MOOCs (Massive Open Online Courses) frequently use grades to calculate whether a student passes the course. To better understand how student behavior is influenced by grade feedback, we conduct a study on the changes of certified students' behavior before and after they have received their grade. We use observational student data from two MITx MOOC...
In classrooms, instructors teaching students how to code have the ability to monitor progress and provide feedback through regular interaction. There is generally no analogous tracing of learning progression in programming MOOCs, hindering the ability of MOOC platforms to provide automated feedback at scale. We explore features for every certified...
Generative adversary networks (GANs) suffer from training pathologies such as instability and mode collapse. These pathologies mainly arise from a lack of diversity in their adversarial interactions. Evolutionary generative adversarial networks apply the principles of evolutionary computation to mitigate these problems. We hybridize two of these ap...
Low population diversity is recognized as a factor in premature convergence of evolutionary algorithms. We investigate program synthesis performance via grammatical evolution. We focus on novelty search – substituting the conventional search objective – based on synthesis quality, with a novelty objective. This prompts us to introduce a new selecti...
In a Massive Open Online Course (MOOC), predictive models of student behavior can support multiple aspects of learning, including instructor feedback and timely intervention. Ongoing courses, when the student outcomes are yet unknown, must rely on models trained from the historical data of previously offered courses. It is possible to transfer mode...
Student learning activity in MOOCs can be viewed from multiple perspectives. We present a new organization of MOOC learner activity data at a resolution that is in between the fine granularity of the clickstream and coarse organizations that count activities, aggregate students or use long duration time units. A detailed access trajectory (DAT) con...
Machine learning models are vulnerable to adversarial examples. In this paper, we are concerned with black-box adversarial attacks, where only loss-oracle access to a model is available. At the heart of black-box adversarial attack is the gradient estimation problem with query complexity O(n), where n is the number of data features. Recent work has...
Motivated by Danskin’s theorem, gradient-based methods have been applied with empirical success to solve minimax problems that involve non-convex outer minimization and non-concave inner maximization. On the other hand, recent work has demonstrated that Evolution Strategies (ES) algorithms are stochastic gradient approximators that seek robust solu...
Student learning activity in MOOCs can be viewed from multiple perspectives. We present a new organization of MOOC learner activity data at a resolution that is in between the fine granularity of the clickstream and coarse organizations that count activities, aggregate students or use long duration time units. A detailed access trajectory (DAT) con...
In MOOCs predictive models of student behavior support many aspects of learning design, including instructor feedback and timely intervention. Ongoing (target) courses, when their predictive outcomes are yet unknown, must rely upon models trained from the historical data of previously offered (source) courses. It is possible to transfer such models...
GANs are difficult to train due to convergence pathologies such as mode and discriminator collapse. We introduce Lipizzaner, an open source software system that allows machine learning engineers to train GANs in a distributed and robust way. Lipizzaner distributes a competitive coevolutionary algorithm which, by virtue of dual, adapting, generator...
Timely prediction of clinically critical events in Intensive Care Unit (ICU) is important for improving care and survival rate. Most of the existing approaches are based on the application of various classification methods on explicitly extracted statistical features from vital signals. In this work, we propose to eliminate the high cost of enginee...
We investigate the application of a version of Genetic Programming with grammars, called Grammatical Evolution, and a multi-population competitive coevolutionary algorithm for anticipating tax evasion in the domain of U.S. Partnership tax regulations. A problem in tax auditing is that as soon as one evasion scheme is detected a new, slightly mutate...
An accurate and timely prediction of clinically critical events in Intensive Care Unit (ICU) is important for improving care and survival rate. Most of the existing approaches are based on the application of various classification methods on explicitly extracted statistical features from the vital signals of ICU patients such as the mean, the stand...
With the celebrated success of deep learning, some attempts to develop effective methods for detecting malicious PowerShell programs employ neural nets in a traditional natural language processing setup while others employ convolutional neural nets to detect obfuscated malicious commands at a character level. While these representations may express...