
Pedro DomingosUniversity of Washington Seattle | UW
Pedro Domingos
About
227
Publications
35,454
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
36,258
Citations
Citations since 2017
Publications
Publications (227)
The study and understanding of human behaviour is relevant to computer science, artificial intelligence, neural computation, cognitive science, philosophy, psychology, and several other areas. Presupposing cognition as basis of behaviour, among the most prominent tools in the modelling of behaviour are computational-logic systems, connectionist mod...
Recent advances in the area of lifted inference, which exploits the structure inherent in relational probabilistic models.
Statistical relational AI (StaRAI) studies the integration of reasoning under uncertainty with reasoning about individuals and relations. The representations used are often called relational probabilistic models. Lifted inferen...
Markov logic can be used as a general framework for joining logical and statistical AI.
The study and understanding of human behaviour is relevant to computer science, artificial intelligence, neural computation, cognitive science, philosophy, psychology, and several other areas. Presupposing cognition as basis of behaviour, among the most prominent tools in the modelling of behaviour are computational-logic systems, connectionist mod...
As neural networks grow deeper and wider, learning networks with hard-threshold activations is becoming increasingly important, both for network quantization, which can drastically reduce time and energy requirements, and for creating large integrated systems of deep networks, which may have non-differentiable components and must avoid vanishing an...
Inference in expressive probabilistic models is generally intractable, which makes them difficult to learn and limits their applicability. Sum-product networks are a class of deep models where, surprisingly, inference remains tractable even when an arbitrary number of hidden layers are present. In this paper, we generalize this result to a much bro...
Continuous optimization is an important problem in many areas of AI, including vision, robotics, probabilistic inference, and machine learning. Unfortunately, most real-world optimization problems are nonconvex, causing standard convex techniques to find only local optima, even with extensions like random restarts and simulated annealing. We observ...
Many organizations today have more than very large databases; they have databases that grow without limit at a rate of several million records per day. Mining these continuous data streams brings unique opportunities, but also new challenges. We present a method that can semi-automatically enhance a wide class of existing learning algorithms so the...
Many representation schemes combining first-order logic and probability have been proposed in recent years. Progress in unifying logical and probabilistic inference has been slower. Existing methods are mainly variants of lifted variable elimination and belief propagation, neither of which take logical structure into account. We propose the first m...
The OECD Observer, OECD's quarterly magazine presents concise, up-to-date and authoritative analysis of crucial world economic and social issues. Through the pages of the Observer OECD’s experts offer insights on the questions facing the Organisation’s member governments and provide an excellent opportunity for readers to stay ahead of policy debat...
One of the central themes in Sum-Product networks (SPNs) is the interpretation of sum nodes as marginalized latent variables (LVs). This interpretation yields an increased syntactic or semantic structure, allows the application of the EM algorithm and to efficiently perform MPE inference. In literature, the LV interpretation was justified by explic...
This chapter presents background on SRL models on which our work is based on. We start with a brief technical background on first-order logic and graphical models. In Sect. 2.2, we present an overview of SRL models followed by details on two popular SRL models. We then present the learning challenges in these models and the approaches taken to solv...
In recent years, several probabilistic techniques have been applied to
various debugging problems. However, most existing probabilistic debugging
systems use relatively simple statistical models, and fail to generalize across
multiple programs. In this work, we propose Tractable Fault Localization Models
(TFLMs) that can be learned from data, and p...
Sum-product networks (SPNs) are a promising avenue for probabilistic modeling and have been successfully applied to various tasks. However, some theoretic properties about SPNs are not yet well understood. In this paper we fill some gaps in the theoretic foundation of SPNs. First, we show that the weights of any complete and consistent SPN can be t...
Sum-product networks (SPNs) are a recently-proposed deep architecture that guarantees tractable inference, even on certain high-treewidth models. SPNs are a propositional architecture, treating the instances as independent and identically distributed. In this paper, we introduce Relational Sum-Product Networks (RSPNs), a new tractable first-order p...
Building efficient large-scale knowledge bases (KBs) is a longstanding goal of AI. KBs need to be first-order to be sufficiently expressive, and probabilistic to handle uncertainty, but these lead to intractable inference. Recently, tractable Markov logic (TML) was proposed as a nontrivial tractable first-order probabilistic representation. This pa...
We consider the selectivity constraint on the structure of sum-product networks (SPNs), which allows each sum node to have at most one child with non-zero output for each possible input. This allows us to find globally optimal maximum likelihood parameters in closed form. Although being a constrained class of SPNs, these models still strictly gener...
Many AI applications need to explicitly represent relational structure as well as handle uncertainty. First order probabilistic models combine the power of logic and probability to deal with such domains. A naive approach to inference in these models is to propositionalize the whole theory and carry out the inference on the ground network. Lifted i...
A sequence of random variables is exchangeable if its joint distribution is
invariant under variable permutations. We introduce exchangeable variable
models (EVMs) as a novel class of probabilistic models whose basic building
blocks are partially exchangeable sequences, a generalization of exchangeable
sequences. We prove that a family of tractable...
The chief difficulty in object recognition is that objects' classes are obscured by a large number of extraneous sources of variability, such as pose and part deformation. These sources of variation can be represented by symmetry groups, sets of composable transformations that preserve object identity. Convolutional neural networks (convnets) achie...
In this paper, we present structured message passing (SMP), a unifying
framework for approximate inference algorithms that take advantage of
structured representations such as algebraic decision diagrams and sparse hash
tables. These representations can yield significant time and space savings over
the conventional tabular representation when the m...
Sum-product networks (SPNs) are a new class of deep probabilistic models. SPNs can have unbounded treewidth but inference in them is always tractable. An SPN is either a univariate distribution, a product of SPNs over disjoint variables, or a weighted sum of SPNs over the same variables. We propose the first algorithm for learning the structure of...
MACHINE LEARNING SYSTEMS automatically learn programs from data. This is often a very attractive alternative to manually constructing them, and in the last decade the use of machine learning has spread rapidly throughout computer science and beyond. Machine learning is used in Web search, spam filters, recommender systems, ad placement, credit scor...
Graphical models are usually learned without regard to the cost of doing
inference with them. As a result, even if a good model is learned, it may
perform poorly at prediction, because it requires approximate inference. We
propose an alternative: learning models with a score function that directly
penalizes the cost of inference. Specifically, we l...
The development of knowledge base creation systems has mainly focused on information extraction without considering how to effectively reason over their databases of facts. One reason for this is that the inference required to learn a probabilistic knowledge base from text at any realistic scale is intractable. In this paper, we propose formulating...
Combining first-order logic and probability has long been a goal of AI. Markov logic (Richardson & Domingos, 2006) accomplishes this by attaching weights to first-order formulas and viewing them as templates for features of Markov networks. Unfortunately, it does not have the full power of first-order logic, because it is only defined for finite do...
Many representation schemes combining first-order logic and probability
have been proposed in recent years. Progress in unifying logical and
probabilistic inference has been slower. Existing methods are mainly
variants of lifted variable elimination and belief propagation, neither
of which take logical structure into account. We propose the first
m...
Inference in graphical models consists of repeatedly multiplying and
summing out potentials. It is generally intractable because the derived
potentials obtained in this way can be exponentially large. Approximate
inference techniques such as belief propagation and variational methods
combat this by simplifying the derived potentials, typically by d...
Computing the probability of a formula given the probabilities or weights
associated with other formulas is a natural extension of logical inference to
the probabilistic setting. Surprisingly, this problem has received little
attention in the literature to date, particularly considering that it includes
many standard inference problems as special c...
The key limiting factor in graphical model inference and learning is the
complexity of the partition function. We thus ask the question: what are
general conditions under which the partition function is tractable? The answer
leads to a new kind of deep architecture, which we call sum-product networks
(SPNs). SPNs are directed acyclic graphs with va...
Tractable subsets of first-order logic are a central topic in AI research. Several of these formalisms have been used as the basis for first-order probabilistic languages. However, these are intractable, losing the original moti-vation. Here we propose the first non-trivially tractable first-order probabilistic language. It is a subset of Markov lo...
Sum-product networks are a new deep architecture that can perform fast, exact inference on high-treewidth models. Only generative methods for training SPNs have been proposed to date. In this paper, we present the first discriminative training algorithms for SPNs, combining the high accuracy of the former with the representational power and tractab...
Coarse-to-fine approaches use sequences of increasingly fine approximations to control the complexity of inference and learning. These techniques are often used in NLP and vision applications. However, no coarse-to-fine inference or learning methods have been developed for general first-order probabilistic domains, where the potential gains are eve...
Abduction is a method for finding the best explanation for observations. Arguably the most advanced approach to abduction, especially for natural language processing, is weighted abduction, which uses logical formulas with costs to guide inference. But it has no clear probabilistic semantics. In this paper we propose an approach that implements wei...
The standard approach to feature construction and predictive learning in molecular datasets is to employ computationally expensive graph mining techniques and to bias the feature search exploration using frequency or correlation measures. These features ...
Deep transfer involves generalizing across different domains and using second-order Markov logic is ideally suited for performing it. Markov logic unifies first-order logic and probability. Deep transfer Markov logic (DTM) softens a logical knowledge base by associating a weight with each formula. Worlds that violate formulas become less likely, bu...
Stochastic processes that involve the creation of objects and relations over
time are widespread, but relatively poorly studied. For example, accurate fault
diagnosis in factory assembly processes requires inferring the probabilities of
erroneous assembly operations, but doing this efficiently and accurately is
difficult. Modeled as dynamic Bayesia...
Coarse-to-fine approaches use sequences of increasingly fine approximations to control the complexity of inference and learning. These techniques are often used in NLP and vision applications. However, no coarse-to-fine inference or learning methods have been developed for general first-order probabilistic domains, where the potential gains are eve...
The key limiting factor in graphical model inference and learning is the complexity of the partition function. We thus ask the question: what are the most general conditions under which the partition function is tractable? The answer leads to a newkind of deep architecture, which we call sum-product networks (SPNs). SPNs are directed acyclic graphs...
Extracting knowledge from unstructured text is a long-standing goal of NLP. Although learning approaches to many of its subtasks have been developed (e.g., parsing, taxonomy induction, information extraction), all end-to-end solutions to date require heavy supervision and/or manual engineering, limiting their scope and scalability. We present OntoU...
Many problems require repeated inference on proba- bilistic graphical models, with different values for ev- idence variables or other changes. Examples of such problems include utility maximization, MAP inference, online and interactive inference, parameter and struc- ture learning, and dynamic inference. Since small changes to the evidence typical...
Lifting can greatly reduce the cost of inference on first- order probabilistic graphical models, but constructing the lifted network can itself be quite costly. In online applications (e.g., video segmentation) repeatedly con- structing the lifted network for each new inference can be extremely wasteful, because the evidence typically changes littl...
Arithmetic circuits (ACs) exploit context-specific independence and determinism to allow exact inference even in networks with high treewidth. In this paper, we introduce the first ever approximate inference methods using ACs, for domains where exact inference remains intractable. We propose and evaluate a variety of techniques based on exact compi...
We present an algorithm for learning high-treewidth Markov networks where inference is still tractable. This is made possible by exploiting context-specific independence and determinism in the domain. The class of models our algorithm can learn has the same desirable properties as thin junction trees: polynomial inference, closed-form weight learni...
Lifting can greatly reduce the cost of inference on first-order probabilistic models, but constructing the lifted network can itself be quite costly. In addition, the mini-mal lifted network is often very close in size to the fully propositionalized model; lifted inference yields little or no speedup in these situations. In this paper, we address b...
The structure of a Markov network is typi- cally learned using top-down search. At each step, the search specializes a feature by con- joining it to the variable or feature that most improves the score. This is inefficient, test- ing many feature variations with no support in the data, and highly prone to local optima. We propose bottom-up search a...
Markov logic networks (MLNs) use first- order formulas to define features of Markov networks. Current MLN structure learn- ers can only learn short clauses (4-5 liter- als) due to extreme computational costs, and thus are unable to represent complex regu- larities in data. To address this problem, we present LSM, the first MLN structure learner cap...
Link mining problems are characterized by high complexity (since linked objects are not statistically independent) and uncertainty
(since data is noisy and incomplete). Thus they necessitate a modeling language that is both probabilistic and relational.
Markov logic provides this by attaching weights to formulas in first-order logic and viewing the...
Many problems require repeated inference on probabilistic graphical models, with different values for evidence variables or other changes. Examples of such problems include utility maximization, MAP inference, online and interactive inference, parameter and structure learning, and dynamic inference. Since small changes to the evidence typically onl...
Lifting can greatly reduce the cost of inference on first-order probabilistic graphical models, but constructing the lifted network can itself be quite costly. In online applications (e.g., video segmentation) repeatedly constructing the lifted network for each new inference can be extremely wasteful, because the evidence typically changes little f...
Machine reading aims to automatically extract knowledge from text. It is a long-standing goal of AI and holds the promise of revolutionizing Web search and other fields. In this paper, we analyze the core challenges of machine read-ing and show that statistical relational AI is particularly well suited to address these challenges. We then propose a...
Exploiting ontologies for efficient inference is one of the most widely studied topics in knowledge representation and rea-soning. The use of ontologies for probabilistic inference, however, is much less developed. A number of algorithms for lifted inference in first-order probabilistic languages have been proposed, but their scalability is limited...
Representations that combine first-order logic and probability have been the focus of much recent research. Lifted inference algorithms for them avoid grounding out the domain, bring-ing benefits analogous to those of resolution theorem proving in first-order logic. However, all lifted probabilistic inference algorithms to date treat potentials as...
Most subfields of computer science have an interface layer via which applications communicate with the infrastructure, and this is key to their success (e.g., the Internet in networking, the relational model in databases, etc.). So far this interface layer has been missing in AI. First-order logic and probabilistic graphical models each have some o...
Markov logic networks (MLNs) combine logic and probability by attaching weights to rst-order clauses, and viewing these as templates for fea- tures of Markov networks. Learning MLN struc- ture from a relational database involves learn- ing the clauses and weights. The state-of-the-art MLN structure learners all involve some element of greedily gene...
Standard inductive learning requires that training and test instances come from the same distribution. Transfer learning seeks to remove this restriction. In shallow trans- fer, test instances are from the same do- main, but have a different distribution. In deep transfer, test instances are from a dif- ferent domain entirely (i.e., described by di...
The goal of the University of Washington effort under DIESEL is to develop a unified approach to entity, schema and concept matching. Entity resolution is the problem of determining which mentions in the data correspond to the same object (e.g., "J. Smith" and "Jane Smith" may be the same person). Schema matching is the problem of determining which...
In recent years, many representations have been proposed that combine graphical mod- els with aspects of first-order logic, along with learning and inference algorithms for them. However, the problem of extending decision theory to these representations re- mains largely unaddressed. In this paper, we propose a framework for relational decision the...
We present the first unsupervised approach to the problem of learning a semantic parser, using Markov logic. Our USP system transforms dependency trees into quasi-logical forms, recursively induces lambda forms from these, and clusters them to abstract away syntactic variations of the same meaning. The MAP semantic parse of a sentence is obtained b...
We propose the use of competitive learning in deep networks for understanding sequential data. Hierarchies of competitive learning algorithms have been found in the brain [1] and their use in deep vision networks has been validated [2]. The algorithm is simple to comprehend and yet provides fast, sparse learning. To un-derstand temporal patterns we...
In recent years, there has been a surge of interest in combining statistical and relational learning approaches (Getoor & Taskar, 2007), driven by the realization that many applications require both. Recently
In this chapter, we provide a detailed description of the Markov logic representation. We begin by providing background on first-order logic and probabilistic graphical models and then show how Markov logic unifies and builds on these concepts. Finally, we compare Markov logic to other representations that combine probability and logic.
Inference in Markov logic lets us reason probabilistically about complex relationships. Since an MLN acts as a template for a Markov network, we can always answer probabilistic queries using standard Markov network inference methods on the instantiated network. However, due to the size and complexity of the resulting network, this is often infeasib...
In this chapter, we describe how to overcome a number of limitations of the basic Markov logic representation introduced in Chapter 2. We extend Markov logic to handle continuous variables and infinite domains; uncertain disjunctions and existential quantifiers; and relational decision theory. We also describe inference algorithms for these extensi...
Effective pattern recognition requires understanding both statistical and structural aspects of the input, but in the past
these have mostly been handled separately. Markov logic is a powerful new language that seamlessy combines the two. Models
in Markov logic are sets of weighted formulas in first-order logic, interpreted as templates for feature...
Modern information and knowledge management is characterized by high degrees of complexity and uncertainty. Complexity is well handled by first-order logic, and uncertainty by probabilistic graphical models. What has been sorely missing is a seamless combination of the two. Markov logic provides this by attaching weights to logical formulas and tre...
The field of inductive logic programming (ILP) has made steady progress, since the first ILP workshop in 1991, based on a balance of developments in theory, implementa- tions and applications. More recently there has been an increased emphasis on Probabilistic ILP and the related fields of Statistical Relational Learning (SRL) and Structured Predic...
Extracting knowledge from text has long been a goal of AI. Initial approaches were purely logical and brittle. More recently, the availability of large quantities of text on the Web has led to the develop- ment of machine learning approaches. However, to date these have mainly extracted ground facts, as opposed to general knowledge. Other learning...
Machine learning approaches to coreference resolution are typically supervised, and re- quire expensive labeled data. Some unsuper- vised approaches have been proposed (e.g., Haghighi and Klein (2007)), but they are less accurate. In this paper, we present the first un- supervised approach that is competitive with supervised ones. This is made poss...
In recent years, it has become increasingly clear that the vision of the Semantic Web requires uncertain reasoning over rich, first- order representations. Markov logic brings the power of probabilistic modeling to first-order logic by attaching weights to logical formulas and viewing them as templates for features of Markov networks. This gives na...
Most real-world machine learning problems have both statistical and relational aspects. Thus learners need representations
that combine probability and relational logic. Markov logic accomplishes this by attaching weights to first-order formulas
and viewing them as templates for features of Markov networks. Inference algorithms for Markov logic dra...
Unifying first-order logic and probability is a long-standing goal of AI, and in recent years many representations com- bining aspects of the two have been proposed. However, in- ference in them is generally still at the level of propositional logic, creating all ground atoms and formulas and applying standard probabilistic inference methods to the...
Many real-world problems are characterized by complex rela- tional structure, which can be succinctly represented in fir st- order logic. However, many relational inference algorithms proceed by first fully instantiating the first-order theory a nd then working at the propositional level. The applicability of such approaches is severely limited by...
Markov logic networks (MLNs) combine first-order logic and Markov networks, allowing us to handle the complex- ity and uncertainty of real-world problems in a single consis- tent framework. However, in MLNs all variables and fea- tures are discrete, while most real-world applications also contain continuous ones. In this paper we introduce hybrid M...
As classifiers are deployed to detect malicious behavior ranging from spam to terrorism, adversaries modify their behaviors to avoid detection (e.g., [4, 3, 6]). This makes the very behavior the classifier is trying to detect a function of the classifier itself. Learners that account for concept drift (e.g., [5]) are not sufficient since they do no...
Markov logic networks (MLNs) combine Markov networks and first-order logic, and are a powerful and increasingly popular repre- sentation for statistical relational learning. The state-of-the-art method for discriminative learning of MLN weights is the voted perceptron algo- rithm, which is essentially gradient descent with an MPE approximation to t...
An analytical framework for using powerlaw theory to estimate market size for niche products and consumer groups.
This position paper proposes knowledge-rich data mining as a focus of research, and describes initial steps in pursuing it.
We propose statistical predicate invention as a key problem for statistical relational learn- ing. SPI is the problem of discovering new concepts, properties and relations in struc- tured data, and generalizes hidden variable discovery in statistical models and predicate invention in ILP. We propose an initial model for SPI based on second-order Ma...
The goal of information extraction is to extract database records from text or semi-structured sources. Traditionally, information extraction proceeds by first segmenting each can- didate record separately, and then merging records that refer to the same entities. While computationally efficient, this ap- proach is suboptimal, because it ignores th...
AI systems must be able to learn, reason logically, and handle uncertainty. While much research has focused on each of these goals individually, only recently have we begun to attempt to achieve all three at once. In this talk I will describe Markov logic, a representation that combines the full power of first-order logic and probabilistic graphica...
AI systems must be able to learn, reason logically, and handle uncertainty.While much research has focused on each of these goals individually, only recently have we begun to attempt to achieve all three at once. In this talk, I describe Markov logic, a representation that combines first-order logic and probabilistic graphical models, and algorithm...
We propose a simple approach to combining first-order logic and probabilistic graphical models in a single representation. A Markov logic network (MLN) is a first-order knowledge base with a weight attached to each formula (or clause). Together with a set of constants representing objects in the domain, it specifies a ground Markov network containi...
A formula in first-order logic can be viewed as a tree, with a logical connective at each node, and a knowledge base can be viewed as a tree whose root is a conjunction. Markov logic (Richardson and Domingos, 2006) makes this conjunction prob- abilistic, as well as the universal quantifiers directly under it, but the rest of the tree remains purely...
Propositionalization of a first-order theory followed by sat- isfiability testing has proved to be a remarkably efficient approach to inference in relational domains such as plan- ning (Kautz & Selman 1996) and verification (Jackson 2000). More recently, weighted satisfiability solvers have been used successfully for MPE inference in statistical re...
Intelligent agents must be able to handle the complexity and uncertainty of the real world. Logical AI has focused mainly on the former, and statistical AI on the latter. Markov logic combines the two by attaching weights to rst-order formu- las and viewing them as templates for features of Markov networks. Inference algorithms for Markov logic dra...
Reasoning with both probabilistic and deterministic depen- dencies is important for many real-world problems, and in particular for the emerging field of statistical relational learn- ing. However, probabilistic inference methods like MCMC or belief propagation tend to give poor results when deter- ministic or near-deterministic dependencies are pr...
Processes involving change over time, uncertainty, and rich relational structure are common in the real world, but no general
algorithms exist for learning models of them. In this paper we show how Markov logic networks (MLNs), a recently developed
approach to combining logic and probability, can be applied to time-changing domains. We then show ho...
Markov logic networks (MLNs) combine logic and probability by attaching weights to rst-order clauses, and viewing these as templates for features of Markov networks. In this paper we develop an algorithm for learning the structure of MLNs from relational databases, combining ideas from inductive logic pro- gramming (ILP) and feature induction in Ma...
Social networks have interesting properties. They influence our lives enormously without us being aware of the implications they raise. The authors investigate the following areas concerning social networks: how to exploit our unprecedented wealth of data and how we can mine social networks for purposes such as marketing campaigns; social networks...
Object identication is the problem of determining whether dieren t observations correspond to the same object. It occurs in a wide variety of elds, including vision, natural language, citation matching, and information integration. Traditionally, the problem is solved sepa- rately for each pair of observations, followed by transitive closure. We pr...
The case-based learning paradigm relies upon memorizing cases in the form of successful problem solving experience, such as e.g. a pattern along with its classification in pattern recognition or a problem along with a solution in case-based reasoning. ...