Article

A Data Mining Approach for selecting Bitmap Join Indices.

JCSE 01/2007; 1:177-194. DOI: 10.5626/JCSE.2007.1.2.177
Source: DBLP

ABSTRACT Index selection is one of the most important decisions to take in the physical design of relational data warehouses. Indices reduce significantly the cost of processing complex OLAP queries, but require storage cost and induce maintenance overhead. Two main types of indices are available: mono-attribute indices (e.g., B-tree, bitmap, hash, etc.) and multi-attribute indices (join indices, bitmap join indices). To optimize star join queries characterized by joins between a large fact table and multiple dimension tables and selections on dimension tables, bitmap join indices are well adapted. They require less storage cost due to their binary representation. However, selecting these indices is a difficult task due to the exponential number of candidate attributes to be indexed. Most of approaches for index selection follow two main steps: (1) pruning the search space (i.e., reducing the number of candidate attributes) and (2) selecting indices using the pruned search space. In this paper, we first propose a data mining driven approach to prune the search space of bitmap join index selection problem. As opposed to an existing our technique that only uses frequency of attributes in queries as a pruning metric, our technique uses not only frequencies, but also other parameters such as the size of dimension tables involved in the indexing process, size of each dimension tuple, and page size on disk. We then define a greedy algorithm to select bitmap join indices that minimize processing cost and verify storage constraint. Finally, in order to evaluate the efficiency of our approach, we compare it with some existing techniques.

0 Bookmarks
 · 
138 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: Afin de réduire le temps d’exécution des requêtes décisionnelles, l’administrateur a la possibilité de sélectionner des index de jointure binaires (). Cette sélection demeure une tâche difficile vue la complexité de l’espace de recherche à parcourir. De ce fait, un grand intérêt est porté à la mise en oeuvre d’algorithmes de sélection. Cependant, ces algorithmes sont statiques. Dans cet article, nous centrons nos travaux sur la sélection des index de jointures binaires définis sur plusieurs attributs appartenant à des tables de dimension en utilisant des algorithmes génétiques. Nous présentons deux types d’algorithmes: des algorithmes de sélection statiques et des algorithmes de sélection incrémentales qui prévoient l’adaptation des index sélectionnés à l’arrivée de nouvelles requêtes. Nous concluons nos travaux par une étude expérimentale démontrant l’apport de notre sélection des index de jointure binaires en comparaison avec les travaux de sélection statiques et incrémentales existants.Bitmap join indexes (BJI) have been widely advocated by administrators as a solution to optimize complex queries. Their selection remains hard, since it needs to explore a large search space. Only a few classes of algorithms were proposed to deal with the problem of BJI selection. These algorithms are static and do not take into account the changes of data warehouses in terms of query arrival. In this paper, we propose a genetic algorithm to select BJI defined on multiple attributes belonging to various dimension tables in the static way. This algorithm is extended to deal with the incremental aspect. An intensive experiment was conducted to show the efficiency of our proposal and to compare it with the most important existing studies.
    Journal of Decision Systems. 01/2012; 21(1):51-70.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Data warehouses tend to be extremely large. With terabytes and petabytes of data in the warehouse, complex queries can slow down performance for all deci- sion makers and the task of managing this warehouse becomes difficult. To optimize these queries, many optimization techniques were proposed: materialized views, ad- vanced indexing schemes, data partitioning, parallel processing, etc. The problem of selecting any of these techniques is a very crucial decision for the performance of the data warehouse. Two main modes for selecting optimization techniques exist: sequential and combined. In the first mode, the selection is done in isolation. The main drawback of this mode is its ignorance of the interactions between different optimization techniques. In the combined mode, a joint searching is performed di- rectly in the combined search space of optimization techniques. This selection gives better performance than the sequential selection, since it takes into account inter- dependencies between optimization techniques, but it requires a high complexity. In this paper, we concentrate on the combined mode, where two optimization tech- niques are considered: horizontal partitioning and bitmap join indexes. The use of horizontal partitioning prunes the search space of bitmap join index selection prob- lem. We first show the strong similarities between these two techniques. Secondly, we propose a new approach of selecting simultaneously these two techniques. Ge- netic and greedy algorithms are used for selecting horizontal partitioning schema and bitmap join indexes, respectively. Finally, we conduct intensive experimental studies using a theoretical cost model and the obtained optimization techniques are validated on ORACLE10g using dataset of an APB-1 benchmark.
    New Trends in Data Warehousing and Data Analysis. 01/2009;
  • Annals of Information Systems, Special Issue on new trends in data warehousing and data analysis, Springer. 11/2008; 3:179-2001.

Full-text (2 Sources)

Download
251 Downloads
Available from
May 27, 2014