Article

Discriminative frequent subgraph mining with optimality guarantees

Statistical Analysis and Data Mining 10/2010; 3(5):302 - 318. DOI: 10.1002/sam.10084
Source: DBLP

ABSTRACT The goal of frequent subgraph mining is to detect subgraphs that frequently occur in a dataset of graphs. In classification settings, one is often interested in discovering discriminative frequent subgraphs, whose presence or absence is indicative of the class membership of a graph. In this article, we propose an approach to feature selection on frequent subgraphs, called CORK, that combines two central advantages. First, it optimizes a submodular quality criterion, which means that we can yield a near-optimal solution using greedy feature selection. Second, our submodular quality function criterion can be integrated into gSpan, the state-of-the-art tool for frequent subgraph mining, and help to prune the search space for discriminative frequent subgraphs even during frequent subgraph mining. Copyright © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 302-318, 2010

Download full-text

Full-text

Available from: Philip S. Yu, Apr 23, 2015
0 Followers
 · 
208 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Graph classification is an important data mining task, and various graph kernel methods have been proposed recently for this task. These methods have proven to be effective, but they tend to have high computational overhead. In this paper, we propose an alternative approach to graph clas-sification that is based on feature-vectors constructed from different global topological attributes, as well as global la-bel features. The main idea here is that the graphs from the same class should have similar topological and label at-tributes. Our method is simple and easy to implement, and via a detailed comparison on real benchmark datasets, we show that our topological and label feature-based approach delivers better or competitive classification accuracy, and is also substantially faster than other graph kernels. It is the most effective method for large unlabeled graphs.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The paper presents a new projection operator, named AC- projection, which exhibits good complexity properties as opposed to the graph isomorphism operator typically used in graph mining. We study the size and structure of the search space and some practical properties of the projection operator. These properties give us a specialization algorithm using simple local operations. Then we prove experimentally that we can achieve an important performance gain without or with non-significant loss of discovered patterns quality.
    Proceedings of the 21st international conference on Inductive Logic Programming; 07/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Graph classification is an important data mining task, and various graph kernel methods have been proposed recently for this task. These methods have proven to be effective, but they tend to have high computational overhead. In this paper, we propose an alternative approach to graph classification that is based on feature vectors constructed from different global topological attributes, as well as global label features. The main idea is that the graphs from the same class should have similar topological and label attributes. Our method is simple and easy to implement, and via a detailed comparison on real benchmark datasets, we show that our topological and label feature-based approach delivers competitive classification accuracy, with significantly better results on those datasets that have large unlabeled graph instances. Our method is also substantially faster than most other graph kernels. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2012 © 2012 Wiley Periodicals, Inc.
    Statistical Analysis and Data Mining 08/2012; 5(4):265-283. DOI:10.1002/sam.11153