Finding the base of global implications for the association rule problem
ABSTRACT Using concept lattices as a theoretical background for finding association rules has led to designing algorithms like CHARM (Zaki and Hsiao (1999)), CLOSE (Pasquier, Bastide, Taouil, and Lakhal (1999)) or CLOSET (Pei, Han, and Mao (2000)). While they are considered as extremely appropriate when finding concepts for association rules, they do not cover a certain area of significant results, namely the pseudo-intents that form the base for global implications. We propose an approach that, besides finding all proper partial implications, also finds the pseudo-intents. The way our algorithm is devised, it allows certain important operations on concept lattices, like adding or extracting items, meaning we can reuse previously found results.
- SourceAvailable from: psu.edu[show abstract] [hide abstract]
ABSTRACT: We consider the problem of discovering association rules between items in a large database of sales transactions. We presenttwo new algorithms for solving this problem that are fundamentally different from the known algorithms. Experiments with synthetic as well as real-life data show that these algorithms outperform the known algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems. We also showhow the best features of the two proposed algorithms can be combined into a hybrid algorithm, called AprioriHybrid. Scale-up experiments show that AprioriHybrid scales linearly with the number of transactions. AprioriHybrid also has excellent scale-up properties with respect to the transaction size and the number of items in the database. 1 Introduction Database mining is motivated by the decision support problem faced by most large retail organizations [S + 93]. Progress in bar-code technology has made it possible for retail ...08/2000;
- [show abstract] [hide abstract]
ABSTRACT: Association mining may often derive an undesirably large set of frequent itemsets and association rules. Recent studies have proposed an interesting alternative: mining frequent closed itemsets and their corresponding rules, which has the same power as association mining but substantially reduces the number of rules to be presented. In this paper, we propose an efficient algorithm, CLOSET, for mining closed itemsets, with the development of three techniques: (1) applying a compressed, frequent pattern tree FP-tree structure for mining closed itemsets without candidate generation, (2) developing a single prefix path compression technique to identify frequent closed itemsets quickly, and (3) exploring a partition-based projection mechanism for scalable mining in large databases. Our performance study shows that CLOSET is efficient and scalable over large databases, and is faster than the previously proposed methods. 1 Introduction It has been well recognized that frequent pattern minin...04/2001;
- [show abstract] [hide abstract]
ABSTRACT: In this paper, we address the problem of finding frequent itemsets in a database. Using the closed itemset lattice framework, we show that this problem can be reduced to the problem of finding frequent closed itemsets. Based on this statement, we can construct efficient data mining algorithms by limiting the search space to the closed itemset lattice rather than the subset lattice. Moreover, we show that the set of all frequent closed itemsets suffices to determine a reduced set of association rules, thus addressing another important data mining problem: limiting the number of rules produced without information loss. We propose a new algorithm, called A-Close, using a closure mechanism to find frequent closed itemsets. We realized experiments to compare our approach to the commonly used frequent itemset search approach. Those experiments showed that our approach is very valuable for dense and/or correlated data that represent an important part of existing databases.08/2000;