Nikos Karayannidis's research while affiliated with National Technical University of Athens and other places

Publications (15)

Article
This paper deals with the problem of physical clustering of multidimensional data that are organized in hierarchies on disk in a hierarchy-preserving manner. This is called hierarchical clustering. A typical case, where hierarchical clustering is necessary for reducing I/Os during query evaluation, is the most detailed data of an OLAP cube. The pre...
Article
Star queries are the most prevalent kind of queries in data warehousing, online analytical processing (OLAP), and business intelligence applications. Thus, there is an imperative need for efficiently processing star queries. To this end, a new class of fact table organizations has emerged that exploits path-based surrogate keys in order to hierarch...
Conference Paper
Hierarchical clustering has been proved an effective means for physi- cally organizing large fact tables since it reduces significantly the I/O cost dur- ing ad hoc OLAP query evaluation. In this paper, we propose a novel multidi- mensional file structure for organizing the most detailed data of a cube, the CUBE File. The CUBE File achieves hierarc...
Article
In this article, we present the design and implementation of SISYPHUS, a storage manager for data cubes that provides an efficient physical base for performing on-line analytical processing (OLAP) operations. OLAP poses new requirements to the physical storage layer of a database management system. Special characteristics of OLAP cubes such as mult...
Article
Full-text available
Efficient star query processing is crucial for a performant data warehouse (DW) implementation and much work is available on physical optimization (e.g., indexing and schema design) and logical optimization (e.g., pre-aggregated materialized views with query rewriting). One important step in the query processing phase is, however, still a bottlenec...
Article
A methodology recently proposed to improve processing of star queries on data warehouses is the clustering and indexing of fact tables using their multidimensional hierarchies [DRSN98, MRB99, KS01]. Due to this improved organization schema, processing of aggregation star queries changes dramatically creating new optimization opportunities. An impor...
Conference Paper
Star queries are the most prevalent kind of que- ries in data warehousing, OLAP and business in- telligence applications. Thus, there is an impera- tive need for efficiently processing star queries. To this end, a new class of fact table organiza- tions has emerged that exploits path-based surro- gate keys in order to hierarchically cluster the fac...
Article
Extraction-Transformation-loading (ETL) tools are pieces of software responsible for the extraction of data from several sources, their cleansing, customization and insertion into a data warehouse. Literature and personal experience have guided us to conclude that the problems concerning the ETL tools are primarily problems of complexity, usability...
Conference Paper
On-Line Analytical Processing (OLAP) is a trend in database technology, based on the multidimensional view of data and is an indispensable component of the so-called business intelligence technology. The systems that realize this technology are called OLAP servers and are among the most high-priced products in software industry today [24]. The aim...
Article
Full-text available
In this paper we address the issue of conceptual modeling of data used in multidimensional analysis. We view the problem from the end-user point of view and we describe a set of requirements for the conceptual modeling of realworld OLAP scenarios. Based on those requirements we then define a new conceptual model that intends to capture the static p...
Article
Full-text available
Extraction-Transformation-Loading (ETL) and Data Cleaning tools are pieces of software responsible for the extraction of data from several sources, their cleaning, customization and insertion into a data warehouse. To deal with the complexity and efficiency of the transformation and cleaning tasks we have developed a tool, namely ARKTOS, capable of...
Article
We classify data quality problems that are addressed by data cleaning and provide an overview of the main solution approaches. Data cleaning is especially required when integrating heterogeneous data sources and should be addressed together with schema-related data transformations. In data warehouses, data cleaning is a major part of the so-called...
Conference Paper
In this paper, we present SISYPHUS, a storage manager for data cubes that provides an efficient physical base for performing OLAP operations. On-Line Analytical Processing (OLAP) poses new requirements to the physical storage layer of a database management system. Special characteristics of OLAP cubes such as multidimensionality, hierarchical struc...
Article
Full-text available
Abstract . On-Line Analytical Processing (OLAP) is a trend i n database technology, based on the multidimensional view of data. The aim, of this paper is twofold: (a) to list general problems and solutions applicable to the de,sign of any OLAP system and (b) to present the specific design decisions that we made,for a prototype under development at...

Citations

... LOCATION:012.01234.012345678910.0123456789101112131415161718 PRODUCT:01.012.P.012345 The rationale for inserting the pseudo levels above the grain level lies in that we wish to apply chunking (i.e., partitioning along each dimension) the soonest possible and for all possible dimensions. ...
... Features and functionalities of AU10 c HDFS are possessed to store large volumes of data, with AU11 c SQL-based skills for analytics. 25 Combination of DBMS and map reduce has been successfully demonstrated as HadoopDB for analytical queries on OLAP. 26 MOLAP systems are very well defined on OLAP4cloud and HBaseLattice. ...
... One of the most popular approaches for modeling ETL processes was proposed by Vassiliadis et al.,in [76] at the conceptual level; in [77,78] at the logical level; and, finally, in [79] at the physical level, alongside other publications detailing their efforts. Indeed, in [76], the authors focused on the conceptual representation of the interrelationships of attributes and concepts, as well as the different ETL activities (transformations), such as the check for null values and the allocation of surrogate keys. ...
... Due to the importance of decision support queries, many commercial DBMSs have developed dedicated heuristics for optimizing complex decision support queries [3,17,37] based on the plan space of snowflake queries [22]. ...
... Furthermore, they illustrated the benefits achieved by performance measurements of queries using star schema for a real world application of a SAP business information warehouse. Karaynnidis et al. [31] , proposed a novel multidimensional file structure for organizing the most detailed data of the cube, the CUBE file. The CUBE file archives hierarchical clustering of data enabling fast access via hierarchical restrictions. ...
... Bitmap indexes and multidimensional indexes (e.g., the UB-Tree) are popular approaches. Also physical clustering [12], [6] has been investigated with great effort (e.g., hierarchical clustering). ...
... In this section we define the objective function according to the non-linear cost model for minimizing predefined queries response times under a maintenance cost constraint [12], [13]. We also define a new derivation cost function and a derivation constraint that we add it to the objective function. ...
... Automatic building of hierarchies or dimensions. Many authors provide methods to build new hierarchies or dimensions in an OLAP cube with data mining algorithms such as clustering algorithms [2], [25], [29], [30], [31], association rules [32], [33] or time serie analysis [34]. Concerning automatic building of hierarchies or dimensions, some authors provide methodologies that can work on data with no hierarchical structure, continuous data for example [4], [35] or social network data [36], [37]. ...
... On the other hand, qualitative techniques use patterns, constraints, and rules to detect errors [37]. These approaches can be applied within automated data cleaning tools such as ARKTOS, AJAX, FraQL, Potter's Wheel and IntelliClean [33,37,38]. ...
... In some domains, like the Web of data, ER is a prerequisite to enable semantic search, interlink descriptions and support deep reasoning [15]. It is also an indispensable step in data cleaning [39] [80], data integration [38], and data warehousing [8]. The use of computer techniques to perform ER dates back to the middle of the last century. ...