Lightweight problem determination in DBMSs using data stream analysis techniques.
ABSTRACT Problem determination in a database management system can be a difficult task given the complexity of the system and the large amount of data that must be collected and analyzed. Monitoring the system for this data incurs overhead and has a detrimental effect on application performance. As an alternative to the standard practice of storing the performance data and performing offline analysis, we examine an approach where monitoring data is produced as a continuous data stream and data stream mining techniques are applied. We implement this approach as a prototype system called Tempo on IBM DB2®. Tempo implements Top-K analysis, which is a common task performed by database administrators for problem determination. Top-K analysis typically identifies the set of most frequently occurring events, or the highest consumers of system resources. Our experimental evaluation indicates that Tempo is time and space efficient, incurs low overhead, and produces accurate results.
- SourceAvailable from: Guozhu Dong
Conference Paper: Multi-Dimensional Regression Analysis of Time-Series Data Streams.[Show abstract] [Hide abstract]
ABSTRACT: Real-time production systems and other dynamic environments often generate tremendous (potentially infinite) amount of stream data; the volume of data is too huge to be stored on disks or scanned multiple times. Can we perform on-line, multi-dimensional analysis and data mining of such data to alert people about dramatic changes of situations and to initiate timely, high-quality responses? This is a challenging task.VLDB 2002, Proceedings of 28th International Conference on Very Large Data Bases, August 20-23, 2002, Hong Kong, China; 01/2002
- [Show abstract] [Hide abstract]
ABSTRACT: Although frequent-pattern mining has been widely studied and used, it is challenging to extend it to data streams. Compared to mining from a static transaction data set, the streaming case has far more information to track and far greater complexity to manage. Infrequent items can become frequent later on and hence cannot be ignored. The storage structure needs to be dynamically adjusted to reflect the evolution of itemset frequencies over time.
- [Show abstract] [Hide abstract]
ABSTRACT: The ability to monitor a database server is crucial for effective database administration. Today's commercial database systems support two basic mechanisms for monitoring: (a) obtaining a snapshot of counters to capture current state, and (b) logging events in the server to a table/file to capture history. We show that for a large class of important database administration tasks the above mechanisms are inadequate in functionality or performance. We present an infrastructure called SQLCM that enables continuous monitoring inside the database server and that has the ability to automatically take actions based on monitoring. We describe the implementation of SQLCM in Microsoft SQL Server and show how several common and important monitoring tasks can be easily specified in SQLCM. Our experimental evaluation indicates that SQLCM imposes low overhead on normal server execution end enables monitoring tasks on a production server that would be too expensive using today's monitoring mechanisms.Proceedings of the 20th International Conference on Data Engineering, ICDE 2004, 30 March - 2 April 2004, Boston, MA, USA; 01/2004