A General Framework for Mining Frequent Subgraphs from Labeled Graphs.

Fundamenta Informaticae (Impact Factor: 0.48). 01/2005; 66:53-82.
Source: DBLP

ABSTRACT The derivation of frequent subgraphs from a dataset of labeled graphs has high compu- tational complexity because the hard problems of isomorphism and subgraph isomorphism have to be solved as part of this derivation. To deal with this computational complexity, all previous ap- proaches have focused on one particular kind of graph. In this paper, we propose an approach to conduct a complete search for various classes of frequent subgraphs in a massive dataset of labeled graphs within a practical time. The power of our approach comes from the algebraic representa- tion of graphs, its associated operations and well-organiz ed bias constraints to limit the search space efficiently. The performance has been evaluated using real w orld datasets, and the high scalabil- ity and flexibility of our approach have been confirmed with re spect to the amount of data and the computation time.


Available from: Hiroshi Motoda, Jun 08, 2015
1 Follower
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Web is a steadily evolving resource comprising much more than mere HTML pages. With its ever-growing data sources in a variety of formats, it provides great potential for knowledge discovery. In this article, we shed light on some interesting phenomena of the Web: the deep Web, which surfaces database records as Web pages; the Semantic Web, which defines meaningful data exchange formats; XML, which has established itself as a lingua franca for Web data exchange; and domain-specific markup languages, which are designed based on XML syntax with the goal of preserving semantics in targeted domains. We detail these four developments in Web technology, and explain how they can be used for data mining. Our goal is to show that all these areas can be as useful for knowledge discovery as the HTML-based part of the Web.
    ACM SIGKDD Explorations Newsletter 04/2013; 14(2):63-81. DOI:10.1145/2481244.2481255
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Correlation mining is recognized as one of the most important data mining tasks for its capability to identify underlying dependencies between objects. On the other hand, graph-based data mining techniques are increasingly applied to handle large datasets due to their capability of modeling various non-traditional domains representing real-life complex scenarios such as social/computer networks, map/spatial databases, chemical-informatics domain, bio-informatics, image processing and machine learning. To extract useful knowledge from large amount of spurious patterns, correlation measures are used. Nonetheless, existing graph based correlation mining approaches are unable to capture effective correlations in graph databases. Hence, we have concentrated on graph correlation mining and proposed a new graph correlation measure, gConfidence, to discover more useful graph patterns. Moreover, we have developed an efficient algorithm, CGM (Correlated Graph Mining), to find the correlated graphs in graph databases. The performance of our scheme was extensively analyzed in several real-life and synthetic databases based on runtime and memory consumption, then compared with existing graph correlation mining algorithms, which proved that CGM is scalable with respect to required processing time and memory consumption and outperforms existing approaches by a factor of two in speed of mining correlations.
    Expert Systems with Applications 03/2014; 41(4):1847-1863. DOI:10.1016/j.eswa.2013.08.082 · 1.97 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we describe the Graph Annotation Format (GrAF) and show how it is used represent not only independent linguistic annotations, but also sets of merged annotations as a single graph. To demonstrate this, we have automatically transduced several different annotations of the Wall Street Journal corpus into GrAF and show how the annotations can then be merged, analyzed, and visualized using standard graph algorithms and tools. We also discuss how, as a standard graph representation, it allows for the application of well-established graph traversal and analysis algorithms to produce information about interactions and commonalities among merged annotations. GrAF is an extension of the Linguistic Annotation Framework (LAF) (Ide and Romary, 2004, 2006) developed within ISO TC37 SC4 and as such, implements state-of-the-art best practice guidelines for representing linguistic annotations.