Conference Paper

Sequential Pattern Mining in Data Streams Using the Weighted Sliding Window Model.

DOI: 10.1109/ICPADS.2009.64 Conference: IEEE 15th International Conference on Parallel and Distributed Systems, ICPADS 2009, 8-11 December 2009, Shenzhen, China
Source: DBLP

ABSTRACT Mining data streams for knowledge discovery is important to many applications, including Web click stream mining, network intrusion detection, and on-line transaction analysis. In this paper, by analyzing data characteristics, we propose an efficient algorithm SWSS (Sequential pattern mining with the weighted sliding window model in SPAM) to mine frequent sequential patterns based on the weighted sliding windows model. This algorithm provides more space for users to specify which sequences they are more interested in. Extensive experiments show that the proposed algorithm is feasible and efficient for mining all sequential patterns as users specified.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Data stream applications like sensor network data, click stream data, have data arriving continuously at high speed rates and require online mining process capable of delivering current and near accurate results on demand without full access to all historical stored data. Frequent sequential mining is the process of discovering frequent sequential patterns in data sequences as found in applications like Web log access sequences. Mining frequent sequential patterns on data stream applications contend with many challenges such as limited memory for unlimited data, inability of algorithms to scan infinitely flowing original dataset more than once and to deliver current and accurate result on demand. Existing work on mining frequent patterns on data streams are mostly for non-sequential patterns. This paper proposes SSM-algorithm (sequential stream mining-algorithm), that uses three types of data structures (D-List, PLWAP tree and FSP-tree) to handle the complexities of mining frequent sequential patterns in data streams. It summarizes frequency counts of items with the D-list, continuously builds PLWAP tree and mines frequent sequential patterns of batches of stream records, maintains mined frequent sequential patterns incrementally with FSP tree. The proposed algorithm can be deployed to analyze e-commerce data where the primary source of data is click stream data.
    Computational Intelligence and Data Mining, 2007. CIDM 2007. IEEE Symposium on; 01/2007
  • [Show abstract] [Hide abstract]
    ABSTRACT: Not Available
    Data Engineering, 2001. Proceedings. 17th International Conference on; 02/2001
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: It is challenge to maintain frequent items over a data stream, with a small bounded memory, in a dynamic environment where both insertion/deletion of items are allowed. In this paper, we propose a new novel algorithm, called hCount, which can handle both insertion and deletion of items with a much less memory space than the best reported algorithm. Our algorithm is also superior in terms of precision, recall and processing time. In addition, our approach does not request the preknowledge on the size of range for a data stream, and can handle range extension dynamically. Given a little modification, algorithm hCount can be improved to hCount*, which even owns significantly better performance than before.
    Proceedings of the 2003 ACM CIKM International Conference on Information and Knowledge Management, New Orleans, Louisiana, USA, November 2-8, 2003; 01/2003