A General Framework for Mining Frequent Subgraphs from Labeled Graphs.
ABSTRACT The derivation of frequent subgraphs from a dataset of labeled graphs has high compu- tational complexity because the hard problems of isomorphism and subgraph isomorphism have to be solved as part of this derivation. To deal with this computational complexity, all previous ap- proaches have focused on one particular kind of graph. In this paper, we propose an approach to conduct a complete search for various classes of frequent subgraphs in a massive dataset of labeled graphs within a practical time. The power of our approach comes from the algebraic representa- tion of graphs, its associated operations and well-organiz ed bias constraints to limit the search space efficiently. The performance has been evaluated using real w orld datasets, and the high scalabil- ity and flexibility of our approach have been confirmed with re spect to the amount of data and the computation time.
- SourceAvailable from: citeseerx.ist.psu.edu
Conference Proceeding: Frequent subgraph discovery[show abstract] [hide abstract]
ABSTRACT: As data mining techniques are being increasingly applied to non-traditional domains, existing approaches for finding frequent itemsets cannot be used as they cannot model the requirement of these domains. An alternate way of modeling the objects in these data sets is to use graphs. Within that model, the problem of finding frequent patterns becomes that of discovering subgraphs that occur frequently over the entire set of graphs.The authors present a computationally efficient algorithm for finding all frequent subgraphs in large graph databases. We evaluated the performance of the algorithm by experiments with synthetic datasets as well as a chemical compound dataset. The empirical results show that our algorithm scales linearly with the number of input transactions and it is able to discover frequent subgraphs from a set of graph transactions reasonably fast, even though we have to deal with computationally hard problems such as canonical labeling of graphs and subgraph isomorphism which are not necessary for traditional frequent itemset discoveryData Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on; 02/2001