Chapter

Outlying Aspect Mining via Sum-Product Networks

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Outlying Aspect Mining (OAM) is the task of identifying a subset of features that distinguish an outlier from normal data, which is important for downstream (human) decision-making. Existing methods are based on beam search in the space of feature subsets. They need to compute outlier scores for all examined subsets, and thus rely on simple outlier scoring algorithms.In this paper, we propose SOAM, a novel OAM algorithm based on Sum-Product Networks (SPNs), a class of probabilistic circuits that can accurately model high-dimensional distributions. Our approach needs to fit an SPN only once, and leverages the tractability of marginal inference in SPNs to compute outlier scores in feature subsets. This way, computing outlier scores in subsets is fast, while being based on a flexible and accurate density estimator. We empirically show that SOAM clearly outperform the state-of-the-art method in search-based OAM, and even outperforms recent deep learning-based methods in the majority of the investigated cases. (Available at github.com/stefanluedtke/Sum-Product-Network-OAM). KeywordsOutlying Aspect MiningOutlier InterpretationSum-Product Network

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Full-text available
While all kinds of mixed data---from personal data, over panel and scientific data, to public and commercial data---are collected and stored, building probabilistic graphical models for these hybrid domains becomes more difficult. Users spend significant amounts of time in identifying the parametric form of the random variables (Gaussian, Poisson, Logit, etc.) involved and learning the mixed models. To make this difficult task easier, we propose the first trainable probabilistic deep architecture for hybrid domains that features tractable queries. It is based on Sum-Product Networks (SPNs) with piecewise polynomial leaf distributions together with novel nonparametric decomposition and conditioning steps using the Hirschfeld-Gebelein-Renyi Maximum Correlation Coefficient. This relieves the user from deciding a-priori the parametric form of the random variables but is still expressive enough to effectively approximate any distribution and permits efficient learning and inference.Our experiments show that the architecture, called Mixed SPNs, can indeed capture complex distributions across a wide range of hybrid domains.
Conference Paper
Full-text available
Outlier detection is an important task in many domains and is intensively studied in the past decade. Further, how to explain outliers, i.e., outlier interpretation, is more significant, which can provide valuable insights for analysts to better understand, solve, and prevent these detected outliers. However, only limited studies consider this problem. Most of the existing methods are based on the score-and-search manner. They select a feature subspace as interpretation per queried outlier by estimating outlying scores of the outlier in searched subspaces. Due to the tremendous searching space, they have to utilize pruning strategies and set a maximum subspace length, often resulting in suboptimal interpretation results. Accordingly, this paper proposes a novel Attention-guided Triplet deviation network for Outlier interpretatioN (ATON). Instead of searching a subspace, ATON directly learns an embedding space and learns how to attach attention to each embedding dimension (i.e., capturing the contribution of each dimension to the outlierness of the queried outlier). Specifically, ATON consists of a feature embedding module and a customized self-attention learning module, which are optimized by a triplet deviation-based loss function. We obtain an optimal attention-guided embedding space with expanded high-level information and rich semantics, and thus outlying behaviors of the queried outlier can be better unfolded. ATON finally distills a subspace of original features from the embedding module and the attention coefficient. With the good generality, ATON can be employed as an additional step of any black-box outlier detector. A comprehensive suite of experiments is conducted to evaluate the effectiveness and efficiency of ATON. The proposed ATON significantly outperforms state-of-the-art competitors on 12 real-world datasets and obtains good scalability w.r.t. both data dimensionality and data size.
Article
Full-text available
Anomaly detection, a.k.a. outlier detection or novelty detection, has been a lasting yet active research area in various research communities for several decades. There are still some unique problem complexities and challenges that require advanced approaches. In recent years, deep learning enabled anomaly detection, i.e., deep anomaly detection , has emerged as a critical direction. This article surveys the research of deep anomaly detection with a comprehensive taxonomy, covering advancements in 3 high-level categories and 11 fine-grained categories of the methods. We review their key intuitions, objective functions, underlying assumptions, advantages, and disadvantages and discuss how they address the aforementioned challenges. We further discuss a set of possible future opportunities and new perspectives on addressing the challenges.
Conference Paper
Full-text available
Outlier detection refers to the identification of rare items that are deviant from the general data distribution. Existing approaches suffer from high computational complexity, low predictive capability, and limited interpretability. As a remedy, we present a novel outlier detection algorithm called COPOD, which is inspired by copulas for modeling multivariate data distribution. COPOD first constructs an empirical copula, and then uses it to predict tail probabilities of each given data point to determine its level of "extremeness". Intuitively, we think of this as calculating an anomalous p-value. This makes COPOD both parameter-free, highly interpretable, and computationally efficient. In this work, we make three key contributions, 1) propose a novel, parameter-free outlier detection algorithm with both great performance and interpretability, 2) perform extensive experiments on 30 benchmark datasets to show that COPOD outperforms in most cases and is also one of the fastest algorithms, and 3) release an easy-to-use Python implementation for reproducibility.
Article
Full-text available
We address the problem of outlying aspects mining: given a query object and a reference multidimensional data set, how can we discover what aspects (i.e., subsets of features or subspaces) make the query object most outlying? Outlying aspects mining can be used to explain any data point of interest, which itself might be an inlier or outlier. In this paper, we investigate several open challenges faced by existing outlying aspects mining techniques and propose novel solutions, including (a) how to design effective scoring functions that are unbiased with respect to dimensionality and yet being computationally efficient, and (b) how to efficiently search through the exponentially large search space of all possible subspaces. We formalize the concept of dimensionality unbiasedness, a desirable property of outlyingness measures. We then characterize existing scoring measures as well as our novel proposed ones in terms of efficiency, dimensionality unbiasedness and interpretability. Finally, we evaluate the effectiveness of different methods for outlying aspects discovery and demonstrate the utility of our proposed approach on both large real and synthetic data sets.
Conference Paper
Full-text available
The need for feasible inference in Probabilistic Graphical Models (PGMs) has lead to tractable models like Sum-Product Networks (SPNs). Their highly expressive power and their ability to provide exact and tractable inference make them very attractive for several real world applications, from computer vision to NLP. Recently, great attention around SPNs has focused on structure learning, leading to different algorithms being able to learn both the network and its parameters from data. Here, we enhance one of the best structure learner, LearnSPN, aiming to improve both the structural quality of the learned networks and their achieved likelihoods. Our algorithmic variations are able to learn simpler, deeper and more robust networks. These results have been obtained by exploiting some insights in the building process done by LearnSPN, by hybridizing the network adopting tree-structured models as leaves, and by blending bagging estimations into mixture creation. We prove our claims by empirically evaluating the learned SPNs on several benchmark datasets against other competitive SPN and PGM structure learners.
Article
Full-text available
When we are investigating an object in a data set, which itself may or may not be an outlier, can we identify unusual (i.e., outlying) aspects of the object? In this paper, we identify the novel problem of mining outlying aspects on numeric data. Given a query object o in a multidimensional numeric data set O, in which subspace is o most outlying? Technically, we use the rank of the probability density of an object in a subspace to measure the outlyingness of the object in the subspace. A minimal subspace where the query object is ranked the best is an outlying aspect. Computing the outlying aspects of a query object is far from trivial. A naïve method has to calculate the probability densities of all objects and rank them in every subspace, which is very costly when the dimensionality is high. We systematically develop a heuristic method that is capable of searching data sets with tens of dimensions efficiently. Our empirical study using both real data and synthetic data demonstrates that our method is effective and efficient.
Conference Paper
Full-text available
Sum-product networks (SPNs) are a promising avenue for probabilistic modeling and have been successfully applied to various tasks. However, some theoretic properties about SPNs are not yet well understood. In this paper we fill some gaps in the theoretic foundation of SPNs. First, we show that the weights of any complete and consistent SPN can be transformed into locally normalized weights without changing the SPN distribution. Second, we show that consistent SPNs cannot model distributions significantly (exponentially) more compactly than decompos-able SPNs. As a third contribution, we extend the inference mechanisms known for SPNs with finite states to generalized SPNs with arbitrary input distributions.
Conference Paper
Full-text available
Unsupervised anomaly detection is the process of finding outliers in data sets without prior training. In this paper, a histogram-based outlier detection (HBOS) algorithm is presented, which scores records in linear time. It assumes independence of the features making it much faster than multivariate approaches at the cost of less precision. A comparative evaluation on three UCI data sets and 10 standard algorithms show, that it can detect global outliers as reliable as state-of-the-art algorithms, but it performs poor on local outlier problems. HBOS is in our experiments up to 5 times faster than clustering based algorithms and up to 7 times faster than nearest-neighbor based methods.
Conference Paper
Full-text available
Most existing model-based approaches to anomaly detection construct a profile of normal instances, then identify instances that do not conform to the normal profile as anomalies. This paper proposes a fundamentally different model-based method that explicitly isolates anomalies instead of profiles normal points. To our best knowledge, the concept of isolation has not been explored in current literature. The use of isolation enables the proposed method, iForest, to exploit sub-sampling to an extent that is not feasible in existing methods, creating an algorithm which has a linear time complexity with a low constant and a low memory requirement. Our empirical evaluation shows that iForest performs favourably to ORCA, a near-linear time complexity distance-based method, LOF and random forests in terms of AUC and processing time, and especially in large data sets. iForest also works well in high dimensional problems which have a large number of irrelevant attributes, and in situations where training set does not contain any anomalies.
Conference Paper
Full-text available
For many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or the outliers, can be more interesting than finding the common patterns. Existing work in outlier detection regards being an outlier as a binary property. In this paper, we contend that for many scenarios, it is more meaningful to assign to each object a degree of being an outlier. This degree is called the local outlier factor (LOF) of an object. It is local in that the degree depends on how isolated the object is with respect to the surrounding neighborhood. We give a detailed formal analysis showing that LOF enjoys many desirable properties. Using real-world datasets, we demonstrate that LOF can be used to find outliers which appear to be meaningful, but can otherwise not be identified with existing approaches. Finally, a careful performance evaluation of our algorithm confirms we show that our approach of finding local outliers can be practical.
Article
Full-text available
Suppose you are given some data set drawn from an underlying probability distribution P and you want to estimate a “simple” subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specified value between 0 and 1. We propose a method to approach this problem by trying to estimate a function f that is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabeled data.
Chapter
Anomaly localization is an important problem in computer vision which involves localizing anomalous regions within images with applications in industrial inspection, surveillance, and medical imaging. This task is challenging due to the small sample size and pixel coverage of the anomaly in real-world scenarios. Most prior works need to use anomalous training images to compute a class-specific threshold to localize anomalies. Without the need of anomalous training images, we propose Convolutional Adversarial Variational autoencoder with Guided Attention (CAVGA), which localizes the anomaly with a convolutional latent variable to preserve the spatial information. In the unsupervised setting, we propose an attention expansion loss where we encourage CAVGA to focus on all normal regions in the image. Furthermore, in the weakly-supervised setting we propose a complementary guided attention loss, where we encourage the attention map to focus on all normal regions while minimizing the attention map corresponding to anomalous regions in the image. CAVGA outperforms the state-of-the-art (SOTA) anomaly localization methods on MVTec Anomaly Detection (MVTAD), modified ShanghaiTech Campus (mSTC) and Large-scale Attention based Glaucoma (LAG) datasets in the unsupervised setting and when using only 2% anomalous images in the weakly-supervised setting. CAVGA also outperforms SOTA anomaly detection methods on the MNIST, CIFAR-10, Fashion-MNIST, MVTAD, mSTC and LAG datasets.
Chapter
Outlying Aspect Mining (OAM) aims to find the subspaces (a.k.a. aspects) in which a given query is an outlier with respect to a given data set. Existing OAM algorithms use traditional distance/density-based outlier scores to rank subspaces. Because these distance/density-based scores depend on the dimensionality of subspaces, they cannot be compared directly between subspaces of different dimensionality. Z-score normalisation has been used to make them comparable. It requires to compute outlier scores of all instances in each subspace. This adds significant computational overhead on top of already expensive density estimation—making OAM algorithms infeasible to run in large and/or high-dimensional datasets. We also discover that Z-score normalisation is inappropriate for OAM in some cases. In this paper, we introduce a new score called Simple Isolation score using Nearest Neighbor Ensemble (SiNNE), which is independent of the dimensionality of subspaces. This enables the scores in subspaces with different dimensionalities to be compared directly without any additional normalisation. Our experimental results revealed that SiNNE produces better or at least the same results as existing scores; and it significantly improves the runtime of an existing OAM algorithm based on beam search.
Conference Paper
While outlier detection has been intensively studied in many applications, interpretation is becoming increasingly important to help people trust and evaluate the developed detection models through providing intrinsic reasons why the given outliers are identified. It is a nontrivial task for interpreting the abnormality of outliers due to the distinct characteristics of different detection models, complicated structures of data in certain applications, and imbalanced distribution of outliers and normal instances. In addition, contexts where outliers locate, as well as the relation between outliers and the contexts, are usually overlooked in existing interpretation frameworks. To tackle the issues, in this paper, we propose a Contextual Outlier INterpretation (COIN) framework to explain the abnormality of outliers spotted by detectors. The interpretability of an outlier is achieved through three aspects, i.e., outlierness score, attributes that contribute to the abnormality, and contextual description of its neighborhoods. Experimental results on various types of datasets demonstrate the flexibility and effectiveness of the proposed framework.
Article
This textbook explores the different aspects of data mining from the fundamentals to the complex data types and their applications, capturing the wide diversity of problem domains for data mining issues. It goes beyond the traditional focus on data mining problems to introduce advanced data types such as text, time series, discrete sequences, spatial data, graph data, and social networks. Until now, no single book has addressed all these topics in a comprehensive and integrated way. The chapters of this book fall into one of three categories: • Fundamental chapters: Data mining has four main problems, which correspond to clustering, classification, association pattern mining, and outlier analysis. These chapters comprehensively discuss a wide variety of methods for these problems. • Domain chapters: These chapters discuss the specific methods used for different domains of data such as text data, time-series data, sequence data, graph data, and spatial data. Application chapters: These chapters study important applications such as stream mining, Web mining, ranking, recommendations, social networks, and privacy preservation. The domain chapters also have an applied flavor. Appropriate for both introductory and advanced data mining courses, Data Mining: The Textbook balances mathematical details and intuition. It contains the necessary mathematical details for professors and researchers, but it is presented in a simple and intuitive style to improve accessibility for students and industrial practitioners (including those with a limited mathematical background). Numerous illustrations, examples, and exercises are included, with an emphasis on semantically interpretable examples. Praise for Data Mining: The Textbook - “As I read through this book, I have already decided to use it in my classes. This is a book written by an outstanding researcher who has made fundamental contributions to data mining, in a way that is both accessible and up to date. The book is complete with theory and practical use cases. It’s a must-have for students and professors alike!" -- Qiang Yang, Chair of Computer Science and Engineering at Hong Kong University of Science and Technology "This is the most amazing and comprehensive text book on data mining. It covers not only the fundamental problems, such as clustering, classification, outliers and frequent patterns, and different data types, including text, time series, sequences, spatial data and graphs, but also various applications, such as recommenders, Web, social network and privacy. It is a great book for graduate students and researchers as well as practitioners." -- Philip S. Yu, UIC Distinguished Professor and Wexler Chair in Information Technology at University of Illinois at Chicago
Article
This paper introduces a simple and efficient density estimator that enables fast systematic search. To show its advantage over commonly used kernel density estimator, we apply it to outlying aspects mining. Outlying aspects mining discovers feature subsets (or subspaces) that describe how a query stand out from a given dataset. The task demands a systematic search of subspaces. We identify that existing outlying aspects miners are restricted to datasets with small data size and dimensions because they employ kernel density estimator, which is computationally expensive, for subspace assessments. We show that a recent outlying aspects miner can run orders of magnitude faster by simply replacing its density estimator with the proposed density estimator, enabling it to deal with large datasets with thousands of dimensions that would otherwise be impossible.
Article
Sum-product networks (SPNs) are a new class of deep probabilistic models. SPNs can have unbounded treewidth but inference in them is always tractable. An SPN is either a univariate distribution, a product of SPNs over disjoint variables, or a weighted sum of SPNs over the same variables. We propose the first algorithm for learning the structure of SPNs that takes full advantage of their expressiveness. At each step, the algorithm attempts to divide the current variables into approximately independent subsets. If successful, it returns the product of recursive calls on the subsets; otherwise it returns the sum of recursive calls on subsets of similar instances from the current training set. A comprehensive empirical study shows that the learned SPNs are typically comparable to graphical models in likelihood but superior in inference speed and accuracy.
Article
Novelty detection is the task of classifying test data that differ in some respect from the data that are available during training. This may be seen as “one-class classification”, in which a model is constructed to describe “normal” training data. The novelty detection approach is typically used when the quantity of available “abnormal” data is insufficient to construct explicit models for non-normal classes. Application includes inference in datasets from critical systems, where the quantity of available normal data is very large, such that “normality” may be accurately modelled. In this review we aim to provide an updated and structured investigation of novelty detection research papers that have appeared in the machine learning literature during the last decade.
Article
Outlier mining is a major task in data analysis. Outliers are objects that highly deviate from regular objects in their local neighborhood. Density-based outlier ranking methods score each object based on its degree of deviation. In many applications, these ranking methods degenerate to random listings due to low contrast between outliers and regular objects. Outliers do not show up in the scattered full space, they are hidden in multiple high contrast subspace projections of the data. Measuring the contrast of such subspaces for outlier rankings is an open research challenge. In this work, we propose a novel subspace search method that selects high contrast subspaces for density-based outlier ranking. It is designed as pre-processing step to outlier ranking algorithms. It searches for high contrast subspaces with a significant amount of conditional dependence among the subspace dimensions. With our approach, we propose a first measure for the contrast of subspaces. Thus, we enhance the quality of traditional outlier rankings by computing outlier scores in high contrast projections only. The evaluation on real and synthetic data shows that our approach outperforms traditional dimensionality reduction techniques, naive random projections as well as state-of-the-art subspace search techniques and provides enhanced quality for outlier ranking.
Article
The key limiting factor in graphical model inference and learning is the complexity of the partition function. We thus ask the question: what are general conditions under which the partition function is tractable? The answer leads to a new kind of deep architecture, which we call sum-product networks (SPNs). SPNs are directed acyclic graphs with variables as leaves, sums and products as internal nodes, and weighted edges. We show that if an SPN is complete and consistent it represents the partition function and all marginals of some graphical model, and give semantics to its nodes. Essentially all tractable graphical models can be cast as SPNs, but SPNs are also strictly more general. We then propose learning algorithms for SPNs, based on backpropagation and EM. Experiments show that inference and learning with SPNs can be both faster and more accurate than with standard deep networks. For example, SPNs perform image completion better than state-of-the-art deep networks for this task. SPNs also have intriguing potential connections to the architecture of the cortex.
Conference Paper
We identify a new and interesting high-dimensional outlier detection problem in this paper, that is, detecting the subspaces in which given data points are outliers. We call the subspaces in which a data point is an outlier as its Outlying Subspaces. In this paper, we will propose the prototype of a dynamic subspace search system, called HOS-Miner (HOS stands for High-dimensional Outlying Subspaces), that utilizes a sample-based learning process to effectively identify the outlying subspaces of a given point.
SPFlow: an easy and extensible library for deep probabilistic learning using sum-product networks
  • A Molina
Einsum networks: fast and scalable learning of tractable probabilistic circuits
  • R Peharz