Article

A General Framework for Mining Frequent Subgraphs from Labeled Graphs

Institute of Scientific and Industrial Research, Osaka University, Suika, Ōsaka, Japan
Fundamenta Informaticae (Impact Factor: 0.72). 06/2005; 66(1):53-82.
Source: DBLP

ABSTRACT

The derivation of frequent subgraphs from a dataset of labeled graphs has high compu- tational complexity because the hard problems of isomorphism and subgraph isomorphism have to be solved as part of this derivation. To deal with this computational complexity, all previous ap- proaches have focused on one particular kind of graph. In this paper, we propose an approach to conduct a complete search for various classes of frequent subgraphs in a massive dataset of labeled graphs within a practical time. The power of our approach comes from the algebraic representa- tion of graphs, its associated operations and well-organiz ed bias constraints to limit the search space efficiently. The performance has been evaluated using real w orld datasets, and the high scalabil- ity and flexibility of our approach have been confirmed with re spect to the amount of data and the computation time.

Download full-text

Full-text

Available from: Hiroshi Motoda
  • Source
    • "1. In transactional graph mining, e.g., [12] [21] [22] [23] [28] [41] [42], the dataset consists of many small data graphs which we call transactions, and the task is to discover patterns that occur at least once in a sufficient number of transactions. (Approaches from machine learning or inductive logic programming usually call the small data graphs " examples " instead of transactions.) "
    [Show abstract] [Hide abstract]
    ABSTRACT: New applications of data mining, such as in biology, bioinformatics, or sociology, are faced with large datasetsstructured as graphs. We introduce a novel class of tree-shapedpatterns called tree queries, and present algorithms for miningtree queries and tree-query associations in a large data graph. Novel about our class of patterns is that they can containconstants, and can contain existential nodes which are not counted when determining the number of occurrences of the patternin the data graph. Our algorithms have a number of provableoptimality properties, which are based on the theory of conjunctive database queries. We propose a practical, database-oriented implementation in SQL, and show that the approach works in practice through experiments on data about food webs, protein interactions, and citation analysis. Comment: Full version of two earlier conference papers presented at KDD 2005 and ICDM 2006
    Full-text · Article · Aug 2010
  • Source
    • "The framework for association rule mining was first extended to graphs in AGM [19,20]. Recent work has included application to large data sets [21]. All of these studies represent important progress in computer science with attractive algorithms, and some valuable common structural features have been recognized from large data sets. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Chemical compounds affecting a bioactivity can usually be classified into several groups, each of which shares a characteristic substructure. We call these substructures "basic active structures" or BASs. The extraction of BASs is challenging when the database of compounds contains a variety of skeletons. Data mining technology, associated with the work of chemists, has enabled the systematic elaboration of BASs. This paper presents a BAS knowledge base, BASiC, which currently covers 46 activities and is available on the Internet. We use the dopamine agonists D1, D2, and Dauto as examples and illustrate the process of BAS extraction. The resulting BASs were reasonably interpreted after proposing a few template structures. The knowledge base is useful for drug design. Proposed BASs and their supporting structures in the knowledge base will facilitate the development of new template structures for other activities, and will be useful in the design of new lead compounds via reasonable interpretations of active structures.
    Preview · Article · Jan 2010 · Chemistry Central Journal
  • Source
    • "1. In transactional graph mining, e.g., [12] [21] [22] [23] [28] [41] [42], the dataset consists of many small data graphs which we call transactions, and the task is to discover patterns that occur at least once in a sufficient number of transactions. (Approaches from machine learning or inductive logic programming usually call the small data graphs " examples " instead of transactions.) "
    [Show abstract] [Hide abstract]
    ABSTRACT: New applications of data mining, such as in biology, bioinformatics, or sociology, are faced with large datasets structured as graphs. We present an efficient algorithm for mining associations between tree queries in a large graph. Tree queries are powerful tree-shaped patterns featuring existential variables and data constants. Our algorithm applies the theory of conjunctive database queries to make the generation of association rules efficient. We propose a practical, database-oriented implementation in SQL, and show that the approach works in practice through experiments on data about food webs, protein interactions, and citation analysis.
    Full-text · Conference Paper · Jan 2007
Show more