Nikos Karayannidis's research while affiliated with National Technical University of Athens and other places
What is this page?
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
Publications (15)
This paper deals with the problem of physical clustering of multidimensional data that are organized in hierarchies on disk
in a hierarchy-preserving manner. This is called hierarchical clustering. A typical case, where hierarchical clustering is necessary for reducing I/Os during query evaluation, is the most detailed
data of an OLAP cube. The pre...
Star queries are the most prevalent kind of queries in data warehousing, online analytical processing (OLAP), and business intelligence applications. Thus, there is an imperative need for efficiently processing star queries. To this end, a new class of fact table organizations has emerged that exploits path-based surrogate keys in order to hierarch...
Hierarchical clustering has been proved an effective means for physi- cally organizing large fact tables since it reduces significantly the I/O cost dur- ing ad hoc OLAP query evaluation. In this paper, we propose a novel multidi- mensional file structure for organizing the most detailed data of a cube, the CUBE File. The CUBE File achieves hierarc...
In this article, we present the design and implementation of SISYPHUS, a storage manager for data cubes that provides an efficient physical base for performing on-line analytical processing (OLAP) operations. OLAP poses new requirements to the physical storage layer of a database management system. Special characteristics of OLAP cubes such as mult...
Efficient star query processing is crucial for a performant data warehouse (DW) implementation and much work is available on physical optimization (e.g., indexing and schema design) and logical optimization (e.g., pre-aggregated materialized views with query rewriting). One important step in the query processing phase is, however, still a bottlenec...
A methodology recently proposed to improve processing of star queries on data warehouses is the clustering and indexing of fact tables using their multidimensional hierarchies [DRSN98, MRB99, KS01]. Due to this improved organization schema, processing of aggregation star queries changes dramatically creating new optimization opportunities. An impor...
Star queries are the most prevalent kind of que- ries in data warehousing, OLAP and business in- telligence applications. Thus, there is an impera- tive need for efficiently processing star queries. To this end, a new class of fact table organiza- tions has emerged that exploits path-based surro- gate keys in order to hierarchically cluster the fac...
Extraction-Transformation-loading (ETL) tools are pieces of software responsible for the extraction of data from several sources, their cleansing, customization and insertion into a data warehouse. Literature and personal experience have guided us to conclude that the problems concerning the ETL tools are primarily problems of complexity, usability...
On-Line Analytical Processing (OLAP) is a trend in database technology, based on the multidimensional view of data and is
an indispensable component of the so-called business intelligence technology. The systems that realize this technology are called OLAP servers and are among the most high-priced products in software industry today [24]. The aim...
In this paper we address the issue of conceptual modeling of data used in multidimensional analysis. We view the problem from the end-user point of view and we describe a set of requirements for the conceptual modeling of realworld OLAP scenarios. Based on those requirements we then define a new conceptual model that intends to capture the static p...
Extraction-Transformation-Loading (ETL) and Data Cleaning tools are pieces of software responsible for the extraction of data from several sources, their cleaning, customization and insertion into a data warehouse. To deal with the complexity and efficiency of the transformation and cleaning tasks we have developed a tool, namely ARKTOS, capable of...
We classify data quality problems that are addressed by data cleaning and provide an overview of the main solution approaches. Data cleaning is especially required when integrating heterogeneous data sources and should be addressed together with schema-related data transformations. In data warehouses, data cleaning is a major part of the so-called...
In this paper, we present SISYPHUS, a storage manager for data cubes that provides an efficient physical base for performing OLAP operations. On-Line Analytical Processing (OLAP) poses new requirements to the physical storage layer of a database management system. Special characteristics of OLAP cubes such as multidimensionality, hierarchical struc...
Abstract . On-Line Analytical Processing (OLAP) is a trend i n database technology, based on the multidimensional view of data. The aim, of this paper is twofold: (a) to list general problems and solutions applicable to the de,sign of any OLAP system and (b) to present the specific design decisions that we made,for a prototype under development at...
Citations
... LOCATION:012.01234.012345678910.0123456789101112131415161718 PRODUCT:01.012.P.012345 The rationale for inserting the pseudo levels above the grain level lies in that we wish to apply chunking (i.e., partitioning along each dimension) the soonest possible and for all possible dimensions. ...
... Features and functionalities of AU10 c HDFS are possessed to store large volumes of data, with AU11 c SQL-based skills for analytics. 25 Combination of DBMS and map reduce has been successfully demonstrated as HadoopDB for analytical queries on OLAP. 26 MOLAP systems are very well defined on OLAP4cloud and HBaseLattice. ...
... One of the most popular approaches for modeling ETL processes was proposed by Vassiliadis et al.,in [76] at the conceptual level; in [77,78] at the logical level; and, finally, in [79] at the physical level, alongside other publications detailing their efforts. Indeed, in [76], the authors focused on the conceptual representation of the interrelationships of attributes and concepts, as well as the different ETL activities (transformations), such as the check for null values and the allocation of surrogate keys. ...
... Due to the importance of decision support queries, many commercial DBMSs have developed dedicated heuristics for optimizing complex decision support queries [3,17,37] based on the plan space of snowflake queries [22]. ...
... Furthermore, they illustrated the benefits achieved by performance measurements of queries using star schema for a real world application of a SAP business information warehouse. Karaynnidis et al. [31] , proposed a novel multidimensional file structure for organizing the most detailed data of the cube, the CUBE file. The CUBE file archives hierarchical clustering of data enabling fast access via hierarchical restrictions. ...
... Bitmap indexes and multidimensional indexes (e.g., the UB-Tree) are popular approaches. Also physical clustering [12], [6] has been investigated with great effort (e.g., hierarchical clustering). ...
... In this section we define the objective function according to the non-linear cost model for minimizing predefined queries response times under a maintenance cost constraint [12], [13]. We also define a new derivation cost function and a derivation constraint that we add it to the objective function. ...
... Automatic building of hierarchies or dimensions. Many authors provide methods to build new hierarchies or dimensions in an OLAP cube with data mining algorithms such as clustering algorithms [2], [25], [29], [30], [31], association rules [32], [33] or time serie analysis [34]. Concerning automatic building of hierarchies or dimensions, some authors provide methodologies that can work on data with no hierarchical structure, continuous data for example [4], [35] or social network data [36], [37]. ...
... On the other hand, qualitative techniques use patterns, constraints, and rules to detect errors [37]. These approaches can be applied within automated data cleaning tools such as ARKTOS, AJAX, FraQL, Potter's Wheel and IntelliClean [33,37,38]. ...
... In some domains, like the Web of data, ER is a prerequisite to enable semantic search, interlink descriptions and support deep reasoning [15]. It is also an indispensable step in data cleaning [39] [80], data integration [38], and data warehousing [8]. The use of computer techniques to perform ER dates back to the middle of the last century. ...