A Data Mining Approach for selecting Bitmap Join Indices.

JCSE 12/2007; 1:177-194. DOI: 10.5626/JCSE.2007.1.2.177
Source: DBLP

ABSTRACT Index selection is one of the most important decisions to take in the physical design of relational data warehouses. Indices reduce significantly the cost of processing complex OLAP queries, but require storage cost and induce maintenance overhead. Two main types of indices are available: mono-attribute indices (e.g., B-tree, bitmap, hash, etc.) and multi-attribute indices (join indices, bitmap join indices). To optimize star join queries characterized by joins between a large fact table and multiple dimension tables and selections on dimension tables, bitmap join indices are well adapted. They require less storage cost due to their binary representation. However, selecting these indices is a difficult task due to the exponential number of candidate attributes to be indexed. Most of approaches for index selection follow two main steps: (1) pruning the search space (i.e., reducing the number of candidate attributes) and (2) selecting indices using the pruned search space. In this paper, we first propose a data mining driven approach to prune the search space of bitmap join index selection problem. As opposed to an existing our technique that only uses frequency of attributes in queries as a pruning metric, our technique uses not only frequencies, but also other parameters such as the size of dimension tables involved in the indexing process, size of each dimension tuple, and page size on disk. We then define a greedy algorithm to select bitmap join indices that minimize processing cost and verify storage constraint. Finally, in order to evaluate the efficiency of our approach, we compare it with some existing techniques.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Data warehouses are very large databases usually designed using the star schema. Queries defined on data warehouses are generally complex due to join operations involved. The performance of star schema queries in data warehouses is highly critical and its optimization is hard in general. Several query performance optimization methods exist, such as indexes and table partitioning. In this paper, we propose a new approach based on binary particle swarm optimization for solving the bitmap join index selection problem in data warehouses. This approach selects the optimal set of bitmap join indexes based on a mathematical cost model. Several experiments are performed to demonstrate the effectiveness of the proposed method on the bitmap join index selection problem. Further testing of the method is performed using a database environment specific cost function. The binary particle swarm optimization is found to be more effective than both the genetic algorithm and data mining based approaches.
    The Journal of Supercomputing 05/2013; 68(2). DOI:10.1007/s11227-013-1058-9 · 0.84 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The advancement of next generation sequencing (NGS) and shotgun sequencing technologies produced massive amounts of genomics data. Metagenomics, a powerful technique to study genetic material of uncultivable microorganisms received directly from their natural environment, is dealing with high throughput sequencing read data sets. Assembling, binning and alignment of short reads in order to identify microorganisms of a Metagenomics sample are expensive and time- consuming, regardless of other restrictions. DNA signature is a short nucleotide sequence fragment which is used to distinguish species across all other species. It can be a basis for identifying microorganisms both in environmental and clinical samples directly from the short reads, without assembling and alignment processes. In this paper, we propose a scalable method in which we use optimization techniques borrowed from database technology, namely bitmap indexes. They are used to speed up searching and matching of billions of DNA signatures in the short reads of thousands of different microorganisms, using commodity High Performance Computing, such as Hadoop MapReduce, Hive and Hbase.
    Dexa- ITBAM 2014, Munich - Germany; 09/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Afin de réduire le temps d’exécution des requêtes décisionnelles, l’administrateur a la possibilité de sélectionner des index de jointure binaires (). Cette sélection demeure une tâche difficile vue la complexité de l’espace de recherche à parcourir. De ce fait, un grand intérêt est porté à la mise en oeuvre d’algorithmes de sélection. Cependant, ces algorithmes sont statiques. Dans cet article, nous centrons nos travaux sur la sélection des index de jointures binaires définis sur plusieurs attributs appartenant à des tables de dimension en utilisant des algorithmes génétiques. Nous présentons deux types d’algorithmes: des algorithmes de sélection statiques et des algorithmes de sélection incrémentales qui prévoient l’adaptation des index sélectionnés à l’arrivée de nouvelles requêtes. Nous concluons nos travaux par une étude expérimentale démontrant l’apport de notre sélection des index de jointure binaires en comparaison avec les travaux de sélection statiques et incrémentales existants.Bitmap join indexes (BJI) have been widely advocated by administrators as a solution to optimize complex queries. Their selection remains hard, since it needs to explore a large search space. Only a few classes of algorithms were proposed to deal with the problem of BJI selection. These algorithms are static and do not take into account the changes of data warehouses in terms of query arrival. In this paper, we propose a genetic algorithm to select BJI defined on multiple attributes belonging to various dimension tables in the static way. This algorithm is extended to deal with the incremental aspect. An intensive experiment was conducted to show the efficiency of our proposal and to compare it with the most important existing studies.

Full-text (2 Sources)

Available from
May 27, 2014