# Brendan Juba's research while affiliated with Washington University in St. Louis and other places

**What is this page?**

This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

## Publications (70)

We consider the problem of learning action models for planning in unknown stochastic environments that can be defined using the Probabilistic Planning Domain Description Language (PPDDL). As input, we are given a set of previously executed trajectories, and the main challenge is to learn an action model that has a similar goal achievement probabili...

Determinantal Point Processes (DPPs) are a widely used probabilistic model for negatively correlated sets. DPPs have been successfully employed in Machine Learning applications to select a diverse, yet representative subset of data. In seminal work on DPPs in Machine Learning, Kulesza conjectured in his PhD Thesis (2011) that the problem is NP-comp...

In this technical report, we provide a complete example of running the SAM+ algorithm, an algorithm for learning stochastic planning action models, on a simplified PPDDL version of the Coffee problem. We provide a very brief description of the SAM+ algorithm and detailed description of our simplified version of the Coffee domain, and then describe...

We propose a modular architecture for the lifelong learning of hierarchically structured tasks. Specifically, we prove that our architecture is theoretically able to learn tasks that can be solved by functions that are learnable given access to functions for other, previously learned tasks as subroutines. We empirically show that some tasks that we...

Often machine learning and statistical models will attempt to describe the majority of the data. However, there may be situations where only a fraction of the data can be fit well by a linear regression model. Here, we are interested in a case where such inliers can be identified by a Disjunctive Normal Form (DNF) formula. We give a polynomial time...

Creating a domain model, even for classical, domain-independent planning, is a notoriously hard knowledge-engineering task. A natural approach to solve this problem is to learn a domain model from observations. However, model learning approaches frequently do not provide safety guarantees: the learned model may assume actions are applicable when th...

Robust learning in expressive languages with real-world data continues to be a challenging task. Numerous conventional methods appeal to heuristics without any assurances of robustness. While probably approximately correct (PAC) Semantics offers strong guarantees, learning explicit representations is not tractable, even in propositional logic. Howe...

Many reinforcement learning (RL) environments in practice feature enormous state spaces that may be described compactly by a "factored" structure, that may be modeled by Factored Markov Decision Processes (FMDPs). We present the first polynomial-time algorithm for RL with FMDPs that does not rely on an oracle planner, and instead of requiring a lin...

Creating a domain model, even for classical, domain-independent planning, is a notoriously hard knowledge-engineering task. A natural approach to solve this problem is to learn a domain model from observations. However, model learning approaches frequently do not provide safety guarantees: the learned model may assume actions are applicable when th...

Can deep learning solve multiple tasks simultaneously, even when they are unrelated and very different? We investigate how the representations of the underlying tasks affect the ability of a single neural network to learn them jointly. We present theoretical and empirical findings that a single neural network is capable of simultaneously learning m...

In many real-world scenarios, the time it takes for a mobile agent, e.g., a robot, to move from one location to another may vary due to exogenous events and be difficult to predict accurately. Planning in such scenarios is challenging, especially in the context of Multi-Agent Pathfinding (MAPF), where the goal is to find paths to multiple agents an...

Generating functions, which are widely used in combinatorics and probability theory, encode function values into the coefficients of a polynomial. In this paper, we explore their use as a tractable probabilistic model, and propose probabilistic generating circuits (PGCs) for their efficient representation. PGCs strictly subsume many existing tracta...

Robustly learning in expressive languages with real-world data continues to be a challenging task. Numerous conventional methods appeal to heuristics without any assurances of robustness. While PAC-Semantics offers strong guarantees, learning explicit representations is not tractable even in a propositional setting. However, recent work on so-calle...

Work in machine learning and statistics commonly focuses on building models that capture the vast majority of data, possibly ignoring a segment of the population as outliers. However, there may not exist a good, simple model for the distribution, so we seek to find a small subset where there exists such a model. We give a computationally efficient...

We introduce and study the model of list learning with attribute noise. Learning with attribute noise was introduced by Shackelford and Volper (COLT 1988) as a variant of PAC learning, in which the algorithm has access to noisy examples and uncorrupted labels, and the goal is to recover an accurate hypothesis. Sloan (COLT 1988) and Goldman and Sloa...

In machine learning, predictors trained on a given data distribution are usually guaranteed to perform well for further examples from the same distribution on average. This often may involve disregarding or diminishing the predictive power on atypical examples; or, in more extreme cases, a data distribution may be composed of a mixture of individua...

Practitioners of data mining and machine learning have long observed that the imbalance of classes in a data set negatively impacts the quality of classifiers trained on that data. Numerous techniques for coping with such imbalances have been proposed, but nearly all lack any theoretical grounding. By contrast, the standard theoretical analysis of...

Standard approaches to probabilistic reasoning require that one possesses an explicit model of the distribution in question. But, the empirical learning of models of probability distributions from partial observations is a problem for which efficient algorithms are generally not known. In this work we consider the use of bounded-degree fragments of...

Model-based diagnosis (MBD) is difficult to use in practice because it requires a model of the diagnosed system, which is often very hard to obtain. We explore theoretically how observing the system when it is in a normal state can provide information about the system that is sufficient to learn a partial system model that allows automated diagnosi...

We consider the problem of answering queries about formulas of first-order logic based on background knowledge partially represented explicitly as other formulas, and partially represented as examples independently drawn from a fixed probability distribution. PAC semantics, introduced by Valiant, is one rigorous, general proposal for learning to re...

We consider the problem of learning rules from a data set that support a proof of a given query, under Valiant's PAC-Semantics. We show how any backward proof search algorithm that is sufficiently oblivious to the contents of its knowledge base can be modified to learn such rules while it searches for a proof using those rules. We note that this gi...

We consider the problem of one-way communication when the recipient does not know exactly the distribution that the messages are drawn from, but has a “prior” distribution that is known to be close to the source distribution, a problem first considered by Juba et al. We consider the question of how much longer the messages need to be in order to co...

We consider the problem of one-way communication when the recipient does not know exactly the distribution that the messages are drawn from, but has a "prior" distribution that is known to be close to the source distribution, a problem first considered by Juba et al. We consider the question of how much longer the messages need to be in order to co...

Standard approaches to probabilistic reasoning require that one possesses an explicit model of the distribution in question. But, the empirical learning of models of probability distributions from partial observations is a problem for which efficient algorithms are generally not known. In this work we consider the use of bounded-degree fragments of...

We consider the following conditional linear regression problem: the task is to identify both (i) a $k$-DNF condition $c$ and (ii) a linear rule $f$ such that the probability of $c$ is (approximately) at least some given bound $\mu$, and $f$ minimizes the $\ell_p$ loss of predicting the target $z$ in the distribution of examples conditioned on $c$....

Work in machine learning and statistics commonly focuses on building models that capture the vast majority of data, possibly ignoring a segment of the population as outliers. However, there does not often exist a good model on the whole dataset, so we seek to find a small subset where there exists a useful model. We are interested in finding a line...

AC⁰∘MOD2 circuits are AC⁰ circuits augmented with a layer of parity gates just above the input layer. We study AC⁰∘MOD2 circuit lower bounds for computing the Boolean Inner Product functions. Recent works by Servedio and Viola (ECCC TR12-144) and Akavia et al. (ITCS 2014) have highlighted this problem as a frontier problem in circuit complexity tha...

Previous work in machine learning and statistics commonly focuses on building models that capture the vast majority of data, possibly ignoring a segment of the population as outliers. By contrast, we may be interested in finding a segment of the population for which we can find a linear rule capable of achieving more accurate predictions. We give a...

Juba recently proposed a formulation of learning abductive reasoning from examples, in which both the relative plausibility of various explanations, as well as which explanations are valid, are learned directly from data. The main shortcoming of this formulation of the task is that it assumes access to full-information (i.e., fully specified) examp...

Juba recently proposed a formulation of learning abductive reasoning from examples, in which both the relative plausibility of various explanations, as well as which explanations are valid, are learned directly from data. The main shortcoming of this formulation of the task is that it assumes access to full-information (i.e., fully specified) examp...

In this paper, we introduce a multi-agent multi-armed bandit-based model for ad hoc teamwork with expensive communication. The goal of the team is to maximize the total reward gained from pulling arms of a bandit over a number of epochs. In each epoch, each agent decides whether to pull an arm, or to broadcast the reward it obtained in the previous...

In this paper we explore the theoretical boundaries of planning in a setting where no model of the agent's actions is given. Instead of an action model, a set of successfully executed plans are given and the task is to generate a plan that is safe, i.e., guaranteed to achieve the goal without failing. To this end, we show how to learn a conservativ...

In this paper we explore the theoretical boundaries of planning in a setting where no model of the agent's actions is given. Instead of an action model, a set of successfully executed plans are given and the task is to generate a plan that is safe, i.e., guaranteed to achieve the goal without failing. To this end, we show how to learn a conservativ...

Inference from an observed or hypothesized condition to a plausible cause or explanation for this condition is known as abduction. For many tasks, the acquisition of the necessary knowledge by machine learning has been widely found to be highly effective. However, the semantics of learned knowledge are weaker than the usual classical semantics, and...

Machine learning and statistics typically focus on building models that capture the vast majority of the data, possibly ignoring a small subset of data as "noise" or "outliers." By contrast, here we consider the problem of jointly identifying a significant (but perhaps small) segment of a population in which there is a highly sparse linear regressi...

We consider the problem of one-way communication when the recipient does not know exactly the distribution that the messages are drawn from, but has a “prior” distribution that is known to be close to the source distribution, a problem first considered by Juba et al. [5]. This problem generalizes the classical source coding problem in information t...

We consider the proof search (--automatizability--) problem for propositional proof systems in the context of knowledge discovery (or data mining and analytics). Discovered knowledge necessarily features a weaker semantics than usually employed in mathematical logic, and in this work we find that these weaker semantics may result in a proof search...

We consider the problem of how enormous databases of "common sense" knowledge can be both learned and utilized in reasoning in a computationally efficient manner. We propose that this is possible if the learning only occurs implicitly, i.e., without generating an explicit representation. We show that it is feasible to invoke such implicitly learned...

We consider principled alternatives to unsupervised learning in data mining
by situating the learning task in the context of the subsequent analysis task.
Specifically, we consider a query-answering (hypothesis-testing) task: In the
combined task, we decide whether an input query formula is satisfied over a
background distribution by using input ex...

We give an overview of a theory of semantic communication proposed by Goldreich, Juba, and Sudan. The theory is intended to capture the obstacles that arise when a diverse population of independently designed devices must communicate with one another. The aim of the theory is to provide conceptual foundations for the design and evaluation of device...

We consider a model of teaching in which the learners are consistent and have bounded state, but are otherwise arbitrary. The teacher is non-interactive and "massively open": the teacher broadcasts a sequence of examples of an arbitrary target concept, intended for every possible on-line learning algorithm to learn from. We focus on the problem of...

We consider the problem of answering queries about formulas of propositional
logic based on background knowledge partially represented explicitly as other
formulas, and partially represented as partially obscured examples
independently drawn from a fixed probability distribution, where the queries
are answered with respect to a weaker semantics tha...

Previous works [11, 6] introduced a model of semantic communication between a "user" and a "server," in which the user attempts to achieve a given goal for communication. They show that whenever the user can sense progress, there exist universal user strategies that can achieve the goal whenever it is possible for any other user to reliably do so....

Compression is a fundamental goal of both human language and digital communication,
yet natural language is very different from compression schemes employed by modern computers. We
partly explain this difference using the fact that information theory generally assumes a common prior
probability distribution shared by the encoder and decoder, wherea...

We present the first end-user protocol, guaranteeing the delivery of messages, that automatically adapts to any new packet format that is obtained by applying a short, efficient function to packets from an earlier protocol.

Our model of goals in finite executions, as first introduced in Chapter 2, captured many natural goals for communication, as illustrated in Chapter 3. This framework only captured goals that could be modeled as the user trying to reach a certain state of the environment by communicating with the server, though; another kind of goal that is not capt...

This chapter describes a formal theory of semantic communication in terms of “goals” for communication. In this chapter, we focus exclusively on finite goals, where an agent wishes to reach some desired state of being, in contrast to infinite goals where an agent wishes to maintain some desirable state over time, which we will introduce in Chapter...

We saw in Chapter 2 that our sensing functions are sufficient to capture all finite goals for which we can design protocols, but we only gave two examples of such sensing functions: Example 2.23, a sensing function for the goal of printing described in Example 2.8; and Example 2.24, a sensing function for the goal of computation, first described in...

In the basic universal setting developed in Chapter 2, we restricted our attention to the class of probabilistic polynomial time bounded agents. This was a natural choice, given our usual association of polynomial time algorithms with the notion of “efficient computation” in accordance with the strong version of the Church-Turing thesis proposed by...

In previous works, Juba and Sudan [6] and Goldreich, Juba and Sudan [4] considered the idea of “semantic communication”, wherein two players, a user and a server, attempt to communicate with each other without any prior common language (or communication) protocol. They showed that if communication was goal-oriented and the user could sense progress...

We put forward a general theory of goal-oriented communication, where communication is not an end in itself, but rather a means to achieving some goals of the communicating parties. Focusing on goals provides a framework for addressing the problem of potential “misunderstanding” during communication, where the misunderstanding arises from lack of i...

The theory described in the previous chapters engages in some abuse of our notions of “efficiency.” Although it is easily seen to be necessary and reasonable for our protocol to use different running time bounds for different servers in a given class, the Levin-style enumerations incur an overhead that is exponential in the length of the program us...

We will reflect on what has been achieved and outline what we believe to be the major avenues for future research that have been opened by the work presented here. Roughly, the ultimate goal is to construct flexible protocols under the refined models of Chapters 4 and 8. The starting point, however, is to develop our understanding of how to constru...

The basic theory of semantic communication in infinite executions introduced in Chapter 6 suffers from some of the same defects as the theory for finite executions from Chapter 2: namely, as discussed in Chapter 4, in that setting, the universal strategies we constructed suffered from an exponential overhead (in the length of the target strategy) i...

Recall that our main theorems for finite executions in Chapter 2 and Chapter 5 show that in order to construct an efficient, reliable protocol for achieving a goal, it must be possible to efficiently verify that the goal has been achieved. We know that this places limits on what kinds of goals we could hope to achieve—for example, as we saw in Sect...

We finally return to our original technical motivations for studying semantic communication, outlined in Section 1.1. Specifically, we will present a first attempt at designing end-user network protocols that can adapt to “simple” modifications of the protocol used on the network without third-party intervention. In practice, the network protocols...

In previous works, Juba and Sudan [1] and Goldreich, Juba and Sudan [2] considered the idea of “semantic communication”, wherein two players, a user and a server, attempt to communicate with each other without any prior common language (or communication protocol). They showed that if communication was goal-oriented and the user could sense progress...

We examine the complexity of learning the distributions produced by finite-state quantum sources. We show how prior techniques for learning hidden Markov models can be adapted to the quantum generator model to find that the analogous state of affairs holds: information-theoretically, a polynomial number of samples suffice to approximately identify...

Is is possible for two intelligent beings to communicate meaningfully, without any common language or background? This question has interest on its own, but is especially relevant in the context of modern computational infrastructures where setting up a common protocol be- tween computers is getting to be increasingly burdensome, and where "univers...

We put forward a general theory of goal-oriented communication, where communication is not an end in itself, but rather a means to achieving some goals of the communicating parties. The goals can vary from setting to setting, and we provide a general framework for describing any such goal. In this context, "reliable communication" means overcoming...

We continue the investigation of the task of meaningful communication among intelligent entities (players, agents) without any prior common language. Our generic thesis is that such communication is feasible provided the goals of the communicating players are verifiable and compatible. In a previous work we gave supporting evidence for one specific...

Is it possible for two intelligent beings to communicate meaningfully, without any common language or background? This question has interest on its own, but is especially relevant in the context of modern computational infrastructures where an increase in the diversity of computers is making the task of inter-computer interaction increasingly burde...

We show that it is possible to use data com- pression on independently obtained hypothe- ses from various tasks to algorithmically pro- vide guarantees that the tasks are suciently related to benefit from multitask learning. We give uniform bounds in terms of the em- pirical average error for the true average error of the n hypotheses provided by d...

We examine the problem of finding optimal strategies in simple stochastic games, and the equivalent problem of finding stable configurations of MIN/MAX/AVG circuits. The problem can be seen as a nontrivial extension to linear programming, but no polynomial-time algo- rithm is known, despite significant efforts to find such an algorithm. Our investi...

We informally survey current work in the study of brain function with an eye for complexity-theoretic aspects. We then discuss in more detail why we expect computational complexity theory to play a larger, more explicit role in the future, examine the validity of such an approach, and attempt to outline how such a complexity-theoretic study might p...

In this paper, we examine the problem of finding optimal strategies in simple stochastic games, and the equivalent problem of finding stable configurations of MIN/MAX/AVG circuits. The problem can be seen as a nontrivial extension to linear programming, but no polynomial-time algorithm is known, despite significant efforts to find such an algorithm...

## Citations

... Our contribution We present algorithms for finding k-DNF reference classes that achieve better approximations to the optimal loss than the O(n k )-approximations presented by Juba (2016) and Hainline et al. (2019) based on the "tolerant elimination" algorithm (originally proposed for abduction), respectively for diagnosis and linear regression. Our algorithms are analogues of the improved algorithms for exception-tolerant abduction obtained by Juba et al. (2018) and Zhang et al. (2017). Our algorithms, like these, respectively obtain anÕ(r log log n)-approximation when there is an r-term k-DNF, and anÕ(n k/2 )-approximation in general. ...

... Related works on multi-agents systems (Carlino et al., 2013;Sharon and Stone, 2017;Cui et al., 2021) deals with a similar problem, but they consider autonomous cars navigating over lanes and roads, and coordination is needed at the junctions. In (Choudhury et al., 2021) and (Shahar et al., 2021), they also deal with multi-agents and pathfinding, but not in a situation in which the target of every agent is the same area. Shahar et al. ...

... Tensor decomposition is often used to extract certain elementary features from image data. Dehghanpoor G. et al. used tensor decomposition method to achieve the feature learning on satellite imagery [49]. Non-negative matrix factorization (NMF) is based on non-negative constraints, which allows learn parts from objects. ...

... This was extended to the usual squared-error loss (as well as other ℓ p losses) by Hainline et al. (2019). Juba (2017) also gave an algorithm for the general (non sparse) case that could only find a small fraction of the largest such condition. All of these algorithms find conditions describing subpopulations that are a union of some basic subsets of data, selected by "terms." ...

... FTC uses estimations based on simulations or state space representation [13]. However, these representations need to model the system including the faulty behavior in detail, which in general is not possible since the required amount of expert knowledge is not available [14]. Thus, the estimation of the dynamic behavior needs to be done using available resources such as topology descriptions or historic system data. ...

... For traditional machine learning algorithms, when the number of samples reaches a certain number, the accuracy no longer increases. The performance of DNN continues to increase with the increase in sample size until the maximum value [56,57]. Multi-temporal images with different phenological periods can achieve higher accuracy than a single phase [13,14,26]. ...

... In addition, in almost all practical applications, there is some mismatch between the true and an assumed probabilistic system/data model, which results in performance degradation. This performance loss due to the presence of mismatch has been studied extensively in various setups (see e.g., [31]- [38] and references therein). Moreover, the subjectivity may appear when there are prospect theoretic agents in the system where the decision makers may have different views on the probabilistic models due to their subjective biases [23], [39], [40]. ...

... A Boolean function f : {0, 1} n → {0, 1} is called a bent function if all Fourier coefficients of its ±1 representation f ± (x) := (−1) f (x) have the same absolute value. Lemma 12 (Folklore; for a proof see, e.g., [7,6]). Let f be a bent function on n variables and c ≥ 1 be an integer. ...

... In [34], the authors explain anomalies in images using metadata. Anomaly detection is first performed using Principal Component Analysis (PCA). ...

Reference: Anomaly Explanation : A Review

... p losses) yield algorithms for agnostic learning of the family of rules used for conditions with respect to a particular model that treats false-positives and false-negatives differently. The current state of the art for learning this model achieves roughly a O(n k/2 ) blow-up for general k-DNFs (Zhang et al., 2017), or a O(t log log n) blow-up for t-term k-DNFs (Juba et al., 2018). Indeed, we adapt the approach of Juba et al. (2018) to obtain the analogous guarantee here. ...

Reference: Conditional Linear Regression