Conference Paper

A Data Model for Supporting On-Line Analytical Processing.

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Le nombre d'axes n'est pas limité à trois mais il peut aller jusqu'à plusieurs dizaines formant ainsi un hyper-cube. Elles comportent plusieurs niveaux de granularité qui permettent d'obtenir une vision plus ou moins détaillée lors des analyses [Agrawal et al., 1995], [Li & Wang, 1996], [Gyssens et al., 1996], , [Gyssens & Lakshmanan, 1997], [Datta & Thomas, 1999]. . Dans cet exemple on analyse les 'Quantités' et les 'Montants' (indicateurs d'analyse) de 'Ventes' (sujet d'analyse) des produits informatiques en fonction de trois dimensions : les 'Magasins' où ont été effectuées les ventes, les 'Dates' correspondantes aux ventes et les 'Produits' vendus. ...
... De nombreux travaux concernent la définition des opérateurs de manipulation OLAP [Gray et al., 1996], [Li & Wang, 1996], [Gyssens & Lakshmanan, 1997], , [Cabibbo & Torlone, 1997], [Cabibbo & Torlone, 1998], [Lehner, 1998], [Marcel, 1998], [Pedersen T.B. & Jensen, 1999], [Datta & Thomas, 1999], [Pedersen T.B. et al., 2001], [Abelló et al., 2003], [Franconi & Kamble, 2004], [Messaoud, 2006], [Ravat et al., 2008], [Boukraa et al., 2010], [Bimonte et al., 2012], [Golfarelli & Rizzi, 2013]. Malgré l'absence d'accord sur un ...
... Un état de l'art détaillé se trouve dans . Les premières propositions [Agrawal et al., 1995], [Li & Wang, 1996], [Gyssens et al., 1996], , [Gyssens & Lakshmanan, 1997] se basent sur la métaphore du cube de données (cf. § 1.2.3.1). ...
... En terme de manipulation multidimensionnelle, les premiers travaux sur les manipulations OLAP ont étendu les opérateurs de l'algèbre relationnelle pour le modèle en cube ], [Agrawal et al., 1997] (une transcription SQL des opérations est disponible dans [Agrawal et al., 1995]), [Li C. & Wang, 1996], et . Pour contourner l'inadaptabilité de l'algèbre relationnelle en ce qui concerne la manipulation de structures multidimensionnelles dans un contexte OLAP, de nombreux travaux ont proposé des opérateurs et des opérations pour spécifier et manipuler un cube et ], [Abelló et al., 2003], [Pedersen T.B. et al., 2001] . ...
... Plus complexe, l'agrégation multidimensionnelle a commencé avec la spécification d'agrégats et le principe d'agrégation dans les bases de données statistiques avec les propositions de [Özsoyoglu et al., 1985] (le lecteur est invité à consulter , [Shoshani, 2003] et [Torlone, 2003] pour plus de détails concernant les bases de données statistiques). Cette problématique fut reprise avec l'apparition des premiers modèles cubes pour OLAP : [Li C. & Wang, 1996] [Agrawal et al., 1997] et . Par la suite, l'apparition de notion de hiérarchisation dans les données représentant les axes d'analyse a aussi donné lieu à de nouvelles propositions , [Jagadish et al., 1999] ...
... La première approche regroupe les spécifications d'opérateurs permettant d'agréger des données en exploitant la structure arborescente XML. Ces opérateurs s'inspirent des opérateurs d'agrégation de l'environnement OLAP tels que AGGREGATE [Gyssens & [Li C. & Wang, 1996] ou encore CUBE . Dans [Wang et al., 2003] et [Wang et al., 2005], les auteurs présentent un opérateur d'agrégation de structure XML : XAGGREGATION. ...
Article
Full-text available
Thèse également disponible sur le site de l'Université Paul Sabatier, Toulouse 3 : http://thesesups.ups-tlse.fr/160/
... At the beginning research in conceptual design was focussed on operators/ algebra . Approaches such as [RGS97], [LW96] [P.V98], [AH97] and [ML97] are example of this. In [AJS01] these models are called formalisms due to the lack of semantic richness. ...
... Cube models, such as [LW96], [RGS97], [AH97], [P.V98], proposed in early stages of the research were focussed on operators and/or algebra rather than requirements. However like other cube models, [TJP98], [TC99], [NA01], [Leh98] and [AKS01], these models also include hierarchy definition in the schema. ...
... Then a conceptual design is suggested based on functional dependencies.Research in data warehouse data modeling was started even before the existence of the above discussed design methods. A few examples to this are [R.K96],[LW96],[LR98]. These approaches addressed various modeling issues and proposed models for data representation. ...
... Optimization of these queries is different from traditional techniques. Special models (multi-dimensional models [12][13][14][15]), databases (multi-dimensional databases [16]) and operators [17][18][19][20][21] have been proposed to treat these individual characteristics of decision support queries efficiently. However, relational systems represent a big fraction of today's systems and SQL is the standard query language of these systems. ...
... This is a common characteristic of decision support queries; they aggregate according to several different sets of grouping attributes and then correlate the results through joins (cross-dimensional queries). This idea is implicit in papers that model multi-dimensional databases [15,14]. In our framework this characteristic results in a query graph with several, possibly overlapping, group query components. ...
Article
Performing complex analysis on top of massive data stores is essential to most modern enterprises and organizations and requires significant aggregation over different attribute sets (dimensions) of the participating relations. Such queries may take hours or days, a time period unacceptable in most cases. As a result, it is important to study these queries and identify special frequent cases that can be evaluated with specialized algorithms. Understanding complex aggregate queries leads to better execution plans and, consequently, performance.
... By structured, we mean n-dimensional data sets whereby k of those n dimensions (k < n) are orthogonal to each other, i.e., they are independent coordinates. We can view this space as a hyperbox of dimension k, commonly called a k-dimensional data cube [15], where the remaining n ? k data attributes are uniquely determined by their location in that hyperbox. ...
... Our notion of structured data in visualization applications was addressed by Gelberg [6] and later by Haber [7]. Chang [4] introduced the modern notion of data selection for exploratory visualization, whereas Li and Wang [15] present an algebra-based query language for the datacube model. LeBlanc, Ward, and Wittels [13] describe exploratory visualization of n-dimensional databases. ...
Conference Paper
Full-text available
This paper presents a paradigm for the interactive selection (querying) of data from a structured grid of data points for exploratory visualization. The paradigm is based on specifying and iteratively adjusting the Focus, Extent, and Density (FED) of the data attributes. The FED model supports highly complex queries of structured data in an intuitive fashion, and is augmented with a visual interface composed of a set of simple yet powerful user interface controls for query specification. In addition, statistical aggregations are supported by the model. Finally, the FED model is compared to the SQL paradigm, and is shown to be well suited for mapping to a direct-manipulation graphical interface
... In [LW96] a multidimensional data model is introduced based on relational elements. Dimensions are modeled as "dimension relations", in practice annotating attributes with dimension names. ...
Thesis
Full-text available
Many aspects of data processing are functional in nature and can take advantage of recent developments in the area of functional programming and calculi. The work described in this thesis is an attempt to contribute to this line of thought, in particular exploiting the Haskell functional language as support tool. Haskell is used mainly to animate an abstract model of the relational database cal- culus as defined by Maier, written in the style of model-oriented formal specification. The model uses an monad for capturing errors. This monad (also called the monad) embodies the strategy for combining computations that can present exceptions by passing bound functions from the point an exception is found to the point where it is handled. A collection of functions that capture some of the functionality currently provided by multidimensional database products are presented. In particular, functions that per- mit to classify and to reduce relations (tables) which, suitably combined, will permit to carry out the multidimensional analysis of a relational database. Parametricity and genericity (polytypism) make room for further extensions of the model. Generic versions of relational standard (type-constructor parametric) and multi- dimensional analysis operations are expressed in Generic Haskell. A theory of data normalization which is more general than the standard relational database theory (of which this appears to be a particular case) is suggested using higher- order polymorphism and constructor classes in Haskell 98. Besides animation, the functional model is further subjected to formal reasoning and calculation, paving the way to the eventual polytypic (generic) formulation of the standard relational calculus.
... Several industrial standards already exist [13,14,15,16], yet, apart for the last one, none of them seems to propose a wellfounded model for OLAP databases. In academia, several proposals on the modelling of cubes also exist [1, 2,9,10,11,21]. Despite all these efforts, we feel that several key characteristics of a cube model have not been stressed, neither by the academia nor the industry (see [19] for a complete discussion). ...
Article
Plz., see https://www.researchgate.net/publication/220920310_Modelling_and_Optimisation_Issues_for_Multidimensional_Databases
... Using the described search criteria within the selected journals and highly cited papers in Scopus for the period of 1/2000-8/2015, 738 articles were collected. Papers whose concepts of BI did not match with the proposed definition such as multidimensional cube algebra [18], or large scale multidimensional data [19], were then excluded along with papers which despite having keywords appearing in the abstracts or subject heading did not investigate BI. This resulted in 184 articles which were then filtered for relevance by analysing the abstracts and skimming the content. ...
Article
Full-text available
Much of the research on Business Intelligence (BI) has examined the ability of BI systems to help organizations address challenges and opportunities. However, the literature is fragmented and lacks an overarching framework to integrate findings and systematically guide research. Moreover, researchers and practitioners continue to question the value of BI systems. This study reviews and synthesizes empirical Information System (IS) studies to learn what we know, how well we know, and what we need to know about the processes of organizations obtaining business value from BI systems. The study aims to identify which parts of the BI business value process have been studied and are still most in need of research, and to propose specific research questions for the future. The findings show that organizations appear to obtain value from BI systems according to the process suggested by Soh and Markus (1995), as a chain of necessary conditions from BI investments to BI assets to BI impacts to organizational performance; however, researchers have not sufficiently studied the probabilistic processes that link the necessary conditions together. Moreover, the research has not sufficiently covered all relevant levels of analysis, nor examined how the levels link up. Overall, the paper identified many opportunities for researchers to provide a more complete picture of how organizations can and do obtain value from BI.
... Operator roll[LiWa96] ...
Chapter
Since the beginning of data warehousing in the early 1990s, an informal consensus has been reached concerning the major terms and components involved in data warehousing. In this chapter, we first explain the main terms and components. Data warehouse vendors are pursuing different strategies in supporting this basic framework. We review a few of the major product families and show in the next chapter a brief survey of the basic problem areas data warehouse practice and research is faced with today. These issues are then treated in more depth in the remainder of this book.
... Kimball pioneered the area of "Dimensional Modeling," which concerns constructing data warehouse schemas amenable to OLAP-based analysis [89]. Data cubes have been implemented in a variety of different systems, so effort has been made to discover unified conceptual or mathematical models that can characterize many implementations [43,146,145,97,4,63,18]. ...
Thesis
Full-text available
The field of data visualization is lacking open tools that support easily developing and using production quality interactive visualizations. Particularly, there is a need for reusable solutions for (1) well known visualization and interaction techniques (2) authoring and sharing visualizations with multiple linked views, and (3) describing existing data such that many data sets can be easily integrated and visualized. This dissertation introduces novel data structures and algorithms for interactive visualizations of data from many sources, addressing these three issues.
... Les travaux relatifs aux entrepôts de données 13 se concentrent pour l'essentiel sur les aspects multidimensionnels [Agrawal, et al 1995] [Li, Wang 1996] [Gyssen, Lakshmanan 1997] [Lehner, et al 1998] [Pedersen, Jensen 1999] et sur les problèmes concernant la gestion, la maintenance et la configuration des vues matérialisées [Gupta, Mumick 1995 Malgré l'aspect majeur des données temporelles dans les systèmes d'aide à la décision, peu de travaux traitent de la gestion du temps dans l'entrepôt. Plusieurs travaux sur les entrepôts se sont concentrés sur certains aspects liés aux données temporelles [Yang, Widom 1998] ]. ...
Article
Full-text available
Le mémoire de cette thèse traite de la modélisation conceptuelle et de la manipulation des données (par des algèbres) dans les systèmes d'aide à la décision. Notre thèse repose sur la dichotomie de deux espaces de stockage : l'entrepôt de données regroupe les extraits des bases sources utiles pour les décideurs et les magasins de données sont déduits de l'entrepôt et dédiés à un besoin d'analyse particulier. Au niveau de l'entrepôt, nous définissons un modèle de données permettant de décrire l'évolution temporelle des objets complexes. Dans notre proposition, l'objet entrepôt intègre des états courants, passés et archivés modélisant les données décisionnelles et leurs évolutions. L'extension du concept d'objet engendre une extension du concept de classe. Cette extension est composée de filtres (temporels et d'archives) pour construire les états passés et archivés ainsi que d'une fonction de construction modélisant le processus d'extraction (origine source). Nous introduisons également le concept d'environnement qui définit des parties temporelles cohérentes de tailles adaptées aux exigences des décideurs. La manipulation des données est une extension des algèbres objet prenant en compte les caractéristiques du modèle de représentation de l'entrepôt. L'extension se situe au niveau des opérateurs temporels et des opérateurs de manipulation des ensembles d'états. Au niveau des magasins, nous définissons un modèle de données multidimensionnelles permettant de représenter l'information en une constellation de faits ainsi que de dimensions munies de hiérarchies multiples. La manipulation des données s'appuie sur une algèbre englobant l'ensemble des opérations multidimensionnelles et offrant des opérations spécifiques à notre modèle. Nous proposons une démarche d'élaboration des magasins à partir de l'entrepôt. Pour valider nos propositions, nous présentons le logiciel GEDOOH (Générateur d'Entrepôts de Données Orientées Objet et Historisées) d'aide à la conception et à la création des entrepôts dans le cadre de l'application médicale REANIMATIC.
... In order to capitalize the SQL language in the multidimensional domain, an extended relational model for multidimensional purposes is required. Several such models have been proposed, such as Wang et al. (1996), Vassiliadis et al. (1998) and Lujan-Mora et al. (2006). These models deal mainly with the mathematical foundation of relational model, but Pedersen et al. (2003) proposed a fairly formalized analytical system that aim straight to the SQL language, named SQL-M, obviously M coming from multidimensional. ...
Article
Full-text available
Hypes or not, Big Data, NoSQL, Analytics, Business Intelligence, Data Science require processing huge amounts of data in various and complex ways using a vast array of statistical methods and tools. Market increasingly needs graduates with both databases and data warehouses technological skills and also statistical competencies in order to decipher the business patterns and trends hidden in the mountains of data. This paper presents the main coordinates of data processing today and some implications for academic curricula. It argues that data analysis and business intelligence professionals could benefit if trained to acquire a proper level of SQL and data warehouses knowledge.
... Multidimensional models have been around for a long time and a variety of approaches have been proposed for representing multidimensional data (Agrawal et al., 1997;Gyssens & Lakshmanan, 1997;Li & Wang, 1996;Nguyen et al., 2000;Pedersen & Jensen, 2001;Pedersen, 2009). Yet, all of them have one serious drawback: it is not clear how to represent data semantics and how to reason about data in it. ...
Preprint
Full-text available
In spite of its fundamental importance, inference has not been an inherent function of multidimensional models and analytical applications. These models are mainly aimed at numeric (quantitative) analysis where the notions of inference and semantics are not well defined. In this paper we argue that inference can be and should be integral part of multidimensional data models and analytical applications. It is demonstrated how inference can be defined using only multidimensional terms like axes and coordinates as opposed to using logic-based approaches. We propose a novel approach to inference in multidimensional space based on the concept-oriented model of data and introduce elementary operations which are then used to define constraint propagation and inference procedures. We describe a query language with inference operator and demonstrate its usefulness in solving complex analytical tasks.
... En la literatura se pueden encontrar diferentes propuestas metodológicas para el modelamiento multidimensional de un DW; ninguna de ellas ha sido aceptada completamente. Las propuestas pueden clasificarse como las basadas en los requerimientos [5,22,16,26,15], las basadas en las bases fuentes [10,4,21,13,11] y en el enfoque compuesto [12,9,3,2,27]. ...
Article
Full-text available
En el ámbito de las bases de datos, diversos métodos han sido propuestos para la generación del diseño lógico de un almacén de datos o data warehouse (DW), por otra parte, existen estudios que permiten el almacenamiento y operación de datos imprecisos o difusos en las bases de datos relacionales. Este artículo presenta una propuesta metodológica con un enfoque arquitectura dirigida por modelos (MDA), la cual permite generar el diseño lógico de un data warehouse difuso (DWD). La técnica utilizada consistió en identificar los elementos básicos del modelado multidimensional (MD) que se extienden al manejo de atributos difusos en las medidas a nivel conceptual, a través de la aplicación de una secuencia de transformaciones con el propósito de obtener el diseño lógico multidimensional difuso. Uno de los elementos claves para esta transformación es la extensión del metamodelo common warehouse metamodel (CWM) OLAP con estereotipos difusos, considerando que, actualmente no existe un método formal que nos permita realizar este tipo de transformaciones.In the databases field, several methods have been proposed for the generation of the logical design of a data warehouse, on the other hand there are also studies that allow the storage and operation of fuzzy data in relational databases. This paper presents a methodological proposal with an MDA approach, which allows to generate the logical design of a fuzzy DW. The technique was used to identify the basic elements of multidimensional modelling and extends them to manage fuzzy attributes in the measures at a conceptual level, through the application of a sequence of transformations to avoid the generation of a logical fuzzy multidimensionaldesign. A key element for this process is the extension of the CWM OLAP metamodel with fuzzy stereotypes. Currently there is no formal method that allows to realize this kind of transformations.
... Multidimensional query languages. Languages like MDX are used in the context of standard OLAP models (Li & Wang, 1996;Pedersen & Jensen, 2001) for solving analytical tasks. This approach is based on the notions of dimension, measure, facts and cube. ...
... The use of partial order in COM makes it similar to multidimensional models (Pedersen & Jensen, 2001;Li & Wang, 1996;Agrawal, Gupta, & Sarawagi, 1997;Gyssens & Lakshmanan, 1997) widely used in OLAP and data warehousing. The main difference is that COM does not use special roles of dimensions, cubes and facts at the level of the model by assuming that these terms describes specific analysis scenarios rather than data itself. ...
Chapter
Full-text available
... OLAP call for sophisticated on-line analysis, something for which the traditional relational model [2] offers little support. Several vendors have already developed OLAP products, but many of these suffer from the following limitations: they do not support a comprehensive "query" language similar to SQL; viewing data in multi-dimensional perspectives involves treating certain attributes as dimensional parameters and the remaining ones as measures, and then analyzing them as a "function" of the parameters; and, finally, unlike for the relational model, there is no precise, commonly agreed, conceptual model for OLAP or the so-called multidimensional databases (MDD) (see [5], [1], [6]). ...
Article
Full-text available
This paper describes an approach to On Line Analytical Processing (OLAP), expresed in the declarative pro- gramming paradigm. We define a collection of functions that capture some of the functionality currently provided by multidimen- sional database product. This is done by defining operations which allow for classifying and reducing relations (tables). Suitably combined, these operations will make possible to carry out the multidimensional analysis of a relational database, and make possible the declarative specification and optimization of multidimensional database queries. The library works over an abstract model of the relational database calculus as defined by Maier, written in the style of model-oriented formal specification in the functional language Haskell (details can be found in (8).
... Yet, these models tend to concentrate on databases where schemas are fully specified, or fuzzy relational schemas are supplied by a user or learned from the databases' attributes. [5,17,18,14] Yet, technological advances have sustained a continuing increase in our abilities to gather and store information at the entity-specific 1 level. [27] As a result, an entity's data is often scattered across databases maintained at a large number of disparate locations with a vast range of schemas. ...
Article
Full-text available
A long-standing challenge for data management is the ability to correctly relate information corresponding to the same entity distributed across databases. Traditional research into record linkage has concentrated on string comparator metrics for records with common, or relatable, attributes. However, spatially distributed data are often devoid of such crucial information for database schema integration. Rather than directly re- late schemas, spatially distributed data can be related through location-basedlinkage algorithms, which link patterns in location-specific attributes (e.g. visit). In this paper we focus on two fundamental algorithms for location-based linkage and we investigate how different distributions of how entities visit locations influence linkage performance. We begin by studying algorithm accuracy for linking real-world data. We then outline a theoretical framework rooted in information theory that allows us to provide insight into observed phe- nomena. Our framework also provides a useful basis for studying the performance of location-based linkage algorithms: we analyze two opposing cases where location visit patterns arise from uniform and power dis- tributions of entities to locations. We carry out our investigations under both the assumption of complete and incomplete information. Our findings suggest that low skew distributions are more easily linked when complete information is known. In contrast, when information is incomplete high skew distributions lead to higher linkage rates.
... To facilitate this process, research and systems developed for data warehousing and relational database management have produced sound architectures for storage, relational modelling, retrieval, and the aggregation of mass amounts of entity-specific data. Yet, traditional data management models tend to concentrate on databases where schemas are fully specified, or fuzzy relational schemas are supplied by a user or learned from the databases attributes [2,13,24,25]. Given the complexity and distribution of the environments in which data now resides, it is difficult to apply or adapt traditional data integration techniques for entity resolution applications. ...
Article
Full-text available
Entity resolution, the process of determining if two or more references correspond to the same entity, is an emerging area of study in computer science. While entity resolution models leverage artificial intelligence, machine learning, and data mining techniques, relationships between various models remain ill-specified. Despite growth in both research and literature, investigations are scattered across communities with minimal communication. This paper introduces a conceptual framework, called ENRES, for explicit and formal entity resolution model definition. Through ENRES, we illustrate how several models solve related, though distinctly different, variants of entity resolution. In addition, we prove the existence of entity resolution challenges yet to be addressed by past or current research.
... In the relational context, the data cube operator (Gray, Bosworth, Layman, & Pirahesh, 1996) was introduced to expand relational tables by computing the aggregations over all the attribute combinations. Kimball (1996) introduces multi-dimensional models based on dimension tables and fact tables, whereas Li and Wang (1996) represent cubes through dimension relations and functions, which map measures to grouping relations. Barralis, Paraboschi, and Teniente (1997) consider multidimensional databases as a set of tables forming de-normalised star schemata. ...
Article
This chapter deals with constraint-based multi-dimensional modelling. The model we define integrates a constellation of facts and dimensions. Along each dimension, various hierarchies are possibly defined and the model supports multiple instantiations of dimensions. The main contribution is the definition of intra-dimension constraints between hierarchies of a same dimension as well as inter-dimension constraints of various dimensions. To facilitate data querying, we define a multi-dimensional query algebra, which integrates the main multi-dimensional operators such as rotations, drill down, roll up... These operators support the constraint-based multi-dimensional modelling. Finally, we present two implementations of this algebra. First, OLAP-SQL is a textual language integrating multi-dimensional concepts (fact, dimension, hierarchy), but it is based on classical SQL syntax. This language is dedicated to specialists such as multi-dimensional database administrators. Second, a graphical query language is presented. This language consists in a graphical representation of multi-dimensional databases, and users specify directly their queries over this graph. This approach is dedicated to non-computer scientist users.
... In the context of data warehousing, the literature proposed several approaches to multidimensional modeling. Some of them have no graphical support and are aimed at establishing a formal foundation for representing cubes and hierarchies as well as an algebra for querying them (Agrawal, Gupta, & Sarawagi, 1995;Cabibbo & Torlone, 1998;Datta & Thomas, 1997;Franconi & Kamble, 2004a;Gyssens & Lakshmanan, 1997;Li & Wang, 1996;Pedersen & Jensen, 1999;Vassiliadis, 1998); since we believe that a distinguishing feature of conceptual models is that of providing a graphical support to be easily understood by both designers and users when discussing and validating requirements, we will not discuss them. ...
Article
Full-text available
In the context of data warehouse design, a basic role is played by conceptual modeling, that pro- vides a higher level of abstraction in describing the warehousing process and architecture in all its aspects, aimed at achieving independence of implementation issues. This chapter focuses on a conceptual model called the DFM that suits the variety of modeling situations that may be encountered in real projects of small to large complexity. The aim of the chapter is to propose a comprehensive set of solutions for conceptual modeling according to the DFM and to give the designer a practical guide for applying them in the context of a design methodology. Besides the basic concepts of multidimensional modeling, the other issues discussed are descriptive and cross-dimension attributes; convergences; shared, incomplete, recursive, and dynamic hierarchies; multiple and optional arcs; and additivity.
... The proposal of the cube operator [7] is one of the early, significant contributions, followed by much work on finding efficient data cube algorithms [2,9]. Relatively little work has gone into modelling, with early proposals based on multidimensional tables, called cubes, having parameters and measures [1,11]. However, these works do not seem to provide a clear separation between schema and data. ...
Conference Paper
We present a functional model for the analysis of large volumes of detailed transactional data, accumulated over time. In our model, the data schema is an acyclic graph with a single root, and data analysis queries are formulated using paths starting at the root. The root models the objects of an application and the remaining nodes model attributes of the objects. Our objective is to use this model as a simple interface for the analyst to formulate queries, and then map the queries to a commercially available system for the actual evaluation.
... Apart from these studies it is important to note various propositions [4,7,9,18] for cubic models where the primary objective is the definition of an algebra for multidimensional analysis. Expressiveness of the algebra is the main topic of these works. ...
Article
Design and implementation of data warehouses remain delicate tasks which are led by experts. Nevertheless it would be interesting to allow the users to define and to build themselves their system through a simple and flexible process. In particular, in the field of production systems, exists the need to integrate data from various sources and to analyze them in order to extract knowledge for optimizing these systems. Traditionally the design of a data warehouse is based on an adequate representation of facts on one hand and dimensions of analysis on the other hand. We show in this article that an unified representation can be envisaged and that the problem comes down to that of the choice of a hierarchy of criteria adapted to the necessities of analysis. Our proposition leans on a graphic representation which offers a visual help to the user.
... Current OLAP data models and query languages fall in one of three categories. First, the simple SQL-like models [Agrawal et al., 1997, Gray et al., 1997, Gyssens et al., 1997, Jagadish et al., 1999, Kimball, 1996, Li et al., 1996 are close to the relational/SQL data model and query language, but do not support advanced features such as automatic aggregation, irregular hierarchies, and correct aggregation. Second, the simple cube models [Cabibbo et al., 1997, Lehner, 1998, Rafanelli et al., 1990, Thomsen, 1999, Vassiliadis, 1998 are " pure " multidimensional models, meaning that their data model is not relational-like and they cannot be queried using SQL. ...
Conference Paper
Full-text available
In this paper we present the SQL OLAP data model, formal algebra, and query language that, unlike current OLAP data models and languages, are both powerful, meaning that they support irregular dimension hierarchies, automatic aggregation of data, and correct aggregation of data, and SQL-compatible, allowing seamless integration with relational technology. We also consider the requirements to the data model posed by integration of OLAP data with external XML data. The concepts are illustrated with a real-world case study from the Business-toBusiness electronic commerce (B2B) domain.
... Other authors have introduced new algebraic operators and/or syntaxes for multidimensional data analysis AGS97,LW96]. However, none of these proposals considers the issue of multiple dependent aggregates within a group. ...
Conference Paper
Full-text available
Datacube queries compute simple aggregates at multiple granularities. In this paper we examine the more general and useful problem of computing a complex subquery involving multiple dependent aggregates at multiple granularities. We call such queries “multi-feature cubes.” An example is “Broken down by all combinations of month and customer, find the fraction of the total sales in 1996 of a particular item due to suppliers supplying within 10% of the minimum price (within the group), showing all subtotals across each dimension.” We classify multi-feature cubes based on the extent to which fine granularity results can be used to compute coarse granularity results; this classification includes distributive, algebraic and holistic multi-feature cubes. We provide syntactic sufficient conditions to determine when a multi-feature cube is either distributive or algebraic. This distinction is important because, as we show, existing datacube evaluation algorithms can be used to compute multi-feature cubes that are distributive or algebraic, without any increase in I/O complexity. We evaluate the CPU performance of computing multi-feature cubes using the datacube evaluation algorithm of Ross and Srivastava. Using a variety of synthetic, benchmark and real-world data sets, we demonstrate that the CPU cost of evaluating distributive multi-feature cubes is comparable to that of evaluating simple datacubes. We also show that a variety of holistic multi-feature cubes can be evaluated with a manageable overhead compared to the distributive case.
... The COM belongs to a set of approaches based on using dimension (degree of freedom) as the main construct for data modelling. This direction has been developed in the area of multidimensional databases [1] [11] [14] and online analytical processing (OLAP) [3]. An important assumption underlying the COM is that the whole model is viewed as one global construct with canonical syntax and semantics. ...
Conference Paper
Full-text available
In the paper we describe the problem of grouping and aggregation in the concept-oriented data model. The model is based on ordering its elements within a hierarchical multidimensional space. This order is then used to define all its main properties and mechanisms. In particular, it is assumed that elements positioned higher are interpreted as groups for their lower level elements. Two operations of projection and de-projection are defined for one-dimensional and multidimensional cases. It is demonstrated how these operations can be used for multidimensional analysis.
... Apart from these studies it is important to note various propositions [2,3,5,10] for cubic models where the primary objective is the definition of an algebra for multidimensional analysis. Others works must also be mentioned. ...
Conference Paper
Full-text available
Two main problems arise in modelling data warehouse structures The first consists in establishing an adequate representation of dimensions in order to facilitate and to control the analysis operations The second relates to the modelling of various types of architecture Research work dedicated to the first problem has been conducted, and adequate solutions have been proposed The second problem has not received so much attention However, there is a need to apprehend complex structures interconnecting dimensions and facts in various ways In this paper, we propose a model through which dimensions at different levels can be shared between different facts and various relationships between these facts can be described Using this model, we then define the notion of well - formed warehouse structures
... -l'opération de stockage permet d'instancier l'entrepôt par un ensemble de documents choisis, extraits du Web, et jugés intéressants pour l'organisation, -l'opération d'analyse permet de construire des magasins (vues spécifiques) à partir de l'entrepôt en utilisant une interface graphique. Ces magasins permettent d'analyser les informations de l'entrepôt d'une manière multidimensionnelle, c'est à dire d'analyser les données selon différentes dimensions [LI96]. Nous pouvons par exemple analyser les livres par auteur, éditeur et thèmes abordés (un thème peut être considéré comme un ensemble de motsclés). ...
Conference Paper
Full-text available
graphiques. Abstract : The development of the Internet generated the increase in the volume of information available on this network. Nowadays, this information is more and more used by the companies for economic, strategic, scientific or technical development. This cannot be done without using database techniques. This is the reason why the web warehouse or dataweb constitutes today a real need for the companies in order to take maximum advantages of the web and of the information it contains. The proposed warehouse allows us to store and analyze the information extracted from the web.
... Data warehouse models are called multidimensional models or hypercubes and have been formalized by several authors [1,4,8,19,20,27]. They are designed to represent measurable facts or indicators and the various dimensions that characterize the facts. ...
Article
In field such as Cardiology, data used for clinical studies is not only alphanumeric, but can also be composed of images or signals. Multimedia data warehouse then must be studied in order to provide an efficient environment for the analysis of this data. The analysis environment must include appropriate processing methods in order to compute or extract the knowledge embedded into raw data. Traditional multidimensional models have a static structure which members of dimensions are computed in a unique way. However, multimedia data is often characterized by descriptors that can be obtained by various computation modes. We define these computation modes as “functional versions” of the descriptors. We propose a Functional Multiversion Multidimensional Model by integrating the concept of “version of dimension.” This concept defines dimensions with members computed according to various functional versions. This new approach integrates different computation modes of these members into the proposed model, in order to allow the user to select the best representation of data. In this paper, a conceptual model is formally defined and a prototype for this study is presented. A multimedia data warehouse in the medical field has been implemented on a therapeutic study on acute myocardial infarction
Article
Results of OLAP queries for strategic decision making are generated using warehouse data. For frequent queries, processing overhead increases as same results are generated by traversing through huge volume of warehouse data. Authors suggest saving time for frequent queries by storing them in a relational database referred as MQDB, along with its result and metadata information. Incremental updates for synonymous materialized queries are done using data marts. This article focusses on saving processing time for non-synonymous queries with differed criteria. Criteria is the query condition specified with ‘where' or a ‘having' clause apart from equijoin condition. Defined rules will determine if new results can be derived from existing stored results. If criteria of fired query are a subset of criteria in stored query, results are extracted from existing results using MINUS operation. When criteria are a superset of stored query criteria, new results are appended to existing results using the UNION operation.
Chapter
Under the lights of the OLAP trend and its close relationship to decision support, the development of a formal multidimensional model is a growing effort. This effort stems from the ambition to establish the multidimensional data model similarly or comparatively to Codd’s relational model (Codd, 1970). A number of multidimensional models have been defined, formal and informal, and through their refinement more are being introduced.
Chapter
Full-text available
A significant progress in decision support has appeared in last years. Data warehousing as a collection of decision support technologies (Chaudhuri and Dayal, 1997) has increasingly become a focus of both the academia and the database industry.
Thesis
Die technischen Möglichkeiten, Daten zu erfassen und dauerhaft zu speichern, sind heute so ausgereift, dass insbesondere in Unternehmen und anderen Organisationen große Datenbestände verfügbar sind. In diesen Datenbeständen, häufig als Data Warehouse bezeichnet, sind alle relevanten Informationen zu den Organisationen selbst, den in ihnen ablaufenden Prozessen sowie deren Interaktion mit anderen Organisationen enthalten. Vielfach stellt die zielgerichtete Analyse der Datenbestände den entscheidenden Erfolgsfaktor für Organisationen dar. Zur Analyse der Daten in einem Data Warehouse sind verschiedenste Ansätze verfügbar und erprobt. Zwei der wichtigsten Vertreter sind das Online Analytical Processing (OLAP) und das Data Mining. Beide setzen unterschiedliche Schwerpunkte und werden bisher in der Regel weitgehend isoliert eingesetzt. In dieser Arbeit wird zunächst gezeigt, dass eine umfassende Analyse der Datenbestände in einem Data Warehouse nur durch den integrierten Einsatz beider Analyseansätze erzielt werden kann. Einzelne Fragestellungen, die sich aus diesem Integrationsbedarf ergeben werden ausführlich diskutiert. Zu den betrachteten Fragestellungen gehört die geeignete Modellierung der Daten in einem Data Warehouse. Bei der Bewertung gängiger Modellierungsansätze fließen insbesondere die Anforderungen ein, die sich durch den beschriebenen Integrationsansatz ergeben. Als Ergebnis wird ein konzeptuelles Datenmodell vorgestellt, das Informationen in einer Weise strukturiert, die für OLAP und Data Mining gleichermaßen geeignet ist. Im Bereich der logischen Modellierung werden schließlich diejenigen Schematypen identifiziert, die die Integration der Analyseansätze geeignet unterstützen. Im nächsten Schritt sind die für Data Mining und OLAP unterschiedlichen Systemarchitekturen Gegenstand dieser Arbeit. Deren umfassende Diskussion ergibt eine Reihe von Defiziten. Dies führt schließlich zu einer erweiterten Systemarchitektur, die die Schwachstellen beseitigt und die angestrebte Integration geeignet unterstützt. Die erweiterte Systemarchitektur weist eine Komponente zur anwendungsunabhängigen Optimierung unterschiedlicher Analyseanwendungen auf. Ein dritter Schwerpunkt dieser Arbeit besteht in der Identifikation geeigneter Optimierungsansätze hierfür. Die Bewertung der Ansätze wird einerseits qualitativ durchgeführt. Andererseits wird das Optimierungspotenzial der einzelnen Ansätze auch auf der Grundlage umfangreicher Messreihen gezeigt.
Article
In this paper we consider querying more than one multidimensional cube in a unified way. We show that there are multiple ways of combining multiple cubes. We propose an algebra of operations on multiple cubes. We introduce multiple forms of coalesce operator and the union, the intersection, and the difference operators to combine two cubes. Finally, we define selection and roll up to operate on a pair of cubes. These operations are compositional which always results in a cube and thereby, allows nesting of the operations. We also show that typical OLAP operations can be performed using the algebra.
Article
Full-text available
Multidimensional model is an important conceptual view that can be considered as a mediator between system analysts and users as they work together in formulating a design requirement by which they could propose ideas with no technical and theoretical jargons. Using a multidimensional model, data are represented in terms of facts and dimensions where each fact is associated to multiple dimensions. Accordingly, a graphical visualization of the multidimensional model will very much enhance the comprehension of the semantic meaning contained in the model. In this paper, we proposed an approach to graphically display the multidimensional model using the depth-first search algorithm. The algorithm takes as input a list of dimensions in the form of association lists, which record the dimension name and the level number in the first and second part of the list, respectively. A list named open initially contains the first element of the dimension list and another list called close is initially set to nil. The algorithm keeps track of the current dimension level recorded in the open list and compares it to the previous level of the dimension recorded in the close list. The current level determines position of the dimension node drawn relative to the previous level. If this level is greater than the previous level, then the dimension node is drawn in the following level depth, otherwise it will back track to the same level number recorded in the close list. An implementation of the algorithm to a student fact of a university domain shows how the multidimensional model is drawn properly.
Conference Paper
Full-text available
In the paper the concept-oriented data model (COM) is described from the point of view of its hierarchical and multidimensional properties. The model consists of two levels: syntactic and semantic. At the syntactic level each element is defined as a combination of its superconcepts. At the semantic level each item is defined as a combination of its superitems. Such a definition has several general interpretations such as a hierarchical coordinate system or multidimensional categorization schema. The described approach can be applied to very different problems for dimensional modelling including database systems, knowledge based systems, ontologies, complex categorizations, knowledge sharing and semantics web.
Article
Full-text available
In the context of business intelligence, the use of OLAP (On-Line Analytical Processing) tools constitute a fundamental step to better decision making and the creation of new knowledge. This paper discusses the features of a tool that, when used in conjunction with an OLAP tool, helps systematizing and capturing the reasoning that happens during the decision process, and not only the capture of the final decision.
Article
Full-text available
The model we define organises data in a constellation of facts and dimensions with multiple hierarchies. In order to insure data consistence and reliable data manipulation, we extend this constellation model by intra-and inter-dimension constraints. The intra-dimension constraints allow the definition of exclusions and inclusions between hierarchies of a same dimension. The inter-dimension constraints are related to hierarchies of different dimensions. Also, we study effects of these constraints on multidimensional operations. We depict integration of these constraints within GEDOOH prototype.
Article
Full-text available
La utilización de la tecnología OLAP en nuevos campos de conocimiento y el uso de fuentes de datos no estructuradas ha hecho que surjan nuevos requerimientos sobre los modelos a utilizar en la definición de los datacubos. En este trabajo presentamos un nuevo modelo multidimensional con la capacidad de manipular información imprecisa tanto en los hechos como en las relaciones de las jerarquías definidas en las dimensiones. Para realizarlo utilizamos la lógica difusa y un modelo en dos niveles de forma que la complejidad quede oculta de cara al usuario.
Conference Paper
Data Warehouse is frequently organized as collection of multidimensional data cubes, which represent data in the form of data values, called measures, associated with multiple dimensions and their multiple levels. However, some application areas need more expressive model for description its data. This paper presents the extension of classical multidimensional model to make Data Warehouse more flexible, natural and simple. The concepts and the basic ideas was taken from the classical multidimensional model to propose an approach based on Object-Oriented Paradigm. In this research Object-Oriented Data Model is used for description of Data Warehouse data and basic operations over this model are provided.
Conference Paper
The paper describes the main issues related to the dis- semination of statistical aggregate data on the Web, and DaWinciMD, a Web-enabled multidimensional statistical database. It is shown that statistical databases have several distinguishing features with respect to conventional busi- ness data warehouses, mainly related to problems of pri- vacy preserving and significance of the disseminated data, thus requiring specific adaptations of the usual modeling and interaction paradigms. The main features of DaW- inciMD are illustrated, with a particular focus on the under- lying metadata approach to statistical data modeling and its multidimensional navigation user interface, combining the flexibility of conventional data warehouses with the specific requirements arising from the statistical context.
Article
Conventional data warehouses are passive. All tasks related to analysing data and making decisions must be carried out manually by analysts. Today's data warehouse and OLAP systems offer little support to automatize decision tasks that occur frequently and for which well-established decision procedures are available. Such a functionality can be provided by extending the conventional data warehouse architecture with analysis rules, which mimic the work of an analyst during decision making. Analysis rules extend the basic event/condition/action (ECA) rule structure with mechanisms to analyse data multidimensionally and to make decisions. The resulting architecture is called active data warehouse.
Conference Paper
Full-text available
On-Line Analytical Processing (OLAP) is a trend in database technology based on the multidimensional view of data. Although multidimensional data cubes form t he basic logical data model for OLAP applications, there seems to be no agreement on a common model for cubes. In this paper we propose, a logical model for cubes based on the key observation that a cube is not a self-existing entity, but rather a view over an underlying data set. The model is powerful enough to capture all the commonly encountered OLAP operations s uch as selection, roll-up and drill-down, through a sound and complete algebra. We accompany our model with results on processing cube operations and p rovide syntactic c haracterisations for the problem of cube usability (i.e., the problem of using the tuples of a cube to compute another cube). As part of the solution to this problem, we have developed algorithms to check whether (a) the marginal conditions of two cubes are appropriate for a rewriting, in the presence of aggregation hierarchies and (b) an implication exists between two selection conditions that i nvolve functionally dependent attributes (levels of aggregation in our context). For the latter, we have extended the well-known set of axioms for conjunctive query containment (Ullm89) with axioms describing the role of the functional dependencies. Finally, we present a rewriting algorithm for the cube usability problem.
Conference Paper
Full-text available
Providing access and search among multiple, heterogeneous, distributed and autonomous data warehouses has become one of the main issues in the current research. In this paper, we propose to integrate data warehouse schema information by using metadata represented in XTM (XML Topic Maps) to bridge possible semantic heterogeneity. A detailed description of an architecture that enables the efficient processing of user queries involving data from heterogeneous is presented. As a result, the interoperability is accomplished by a schema integration approach based on XTM. Furthermore, important implementation aspects of the MetaCube-XTM prototype, which makes use of the Meta Data Interchange Specification (MDIS), and the Open Information Model, complete the presentation of our approach.
Article
On-line analytical processing (OLAP) systems considerably improve data analysis and are finding wide-spread use. OLAP systems typically employ multidimensional data models to structure their data. This paper identifies 11 modeling requirements for multidimensional data models. These requirements are derived from an assessment of complex data found in real-world applications. A survey of 14 multidimensional data models reveals shortcomings in meeting some of the requirements. Existing models do not support many-to-many relationships between facts and dimensions, lack built-in mechanisms for handling change and time, lack support for imprecision, and are generally unable to insert data with varying granularities. This paper defines an extended multidimensional data model and algebraic query language that address all I I requirements. The model reuses the common multidimensional concepts of dimension hierarchies and granularities to capture imprecise data. For queries that cannot be answered precisely due to the imprecise data, techniques are proposed that take into account the imprecision in the grouping of the data, in the subsequent aggregate computation, and in the presentation of the imprecise result to the user. In addition, alternative queries unaffected by imprecision are offered. The data model and query evaluation techniques discussed in this paper can be implemented using relational database technology. The approach is also capable of exploiting multidimensional query processing techniques like pre-aggregation. This yields a practical solution with low computational overhead.
Article
Full-text available
Conference Paper
ANSI SQL-92 [MS, ANSI] defines Isolation Levels in terms of phenomena: Dirty Reads, Non-Repeatable Reads, and Phantoms. This paper shows that these phenomena and the ANSI SQL definitions fail to properly characterize several popular isolation ...
Conference Paper
Data analysis applications typically aggregate data across many dimensions looking for unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or one-dimensional answers. Applications need the N-dimensional generalization of these operators. The paper defines that operator, called the data cube or simply cube. The cube operator generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers. The cube treats each of the N aggregation attributes as a dimension of N-space. The aggregate of a particular set of attribute values is a point in this space. The set of points forms an N-dimensionaI cube. Super-aggregates are computed by aggregating the N-cube to lower dimensional spaces. Aggregation points are represented by an “infinite value”: ALL, so the point (ALL,ALL,...,ALL, sum(*)) represents the global sum of all items. Each ALL value actually represents the set of values contributing to that aggregation
Conference Paper
A common approach to improving the performance of statistical query processing is to use precomputed results. Another lower-level approach would be to redesign the storage structure for statistical databases. This avenue is relatively unexplored. The objective of this paper is to present a physical storage structure for statistical databases, whose design is motivated by the characteristics of statistical queries. We show that our proposal enhances multi-attribute clustering efficiency, and improves the performance of statistical and aggregational queries. This customized structure reduces the amount of I/O incurred statistical query processing, thus decreasing the response time
Conference Paper
Large multidimensional arrays are widely used in scientific and engineering database applications. The authors present methods of organizing arrays to make their access on secondary and tertiary memory devices fast and efficient. They have developed four techniques for doing this: (1) storing the array in multidimensional “chunks” to minimize the number of blocks fetched, (2) reordering the chunked array to minimize seek distance between accessed blocks, (3) maintaining redundant copies of the array, each organized for a different chunk size and ordering and (4) partitioning the array onto platters of a tertiary memory device so as to minimize the number of platter switches. The measurements on real data obtained from global change scientists show that accesses on arrays organized using these techniques are often an order of magnitude faster than on the unoptimized data
Conference Paper
A data model and an access method for summary data management are proposed. Summary data, represented as a trinary tuple <statistical function, category, summary>, consist of metaknowledge summarized by a statistical function of a category of individual information typically stored in a conventional database. The concept of category (type or class) and the additivity property of statistical functions form a basis for the model that allows for the derivation of summary data. The complexity of deriving summary data has been found computationally intractable in general, and the proposed summary data model, with disjointness constraint, solves the problem without the loss of information. The proposed access method, called the summary data tree, or SD-tree, which handles an orthogonal category as a hyperrectangle, realizes the proposed summary data model. The structure of the SD-tree provides for efficient operations including summary data search, derivation, and insertion on the stored summary data
Article
Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like total sales. The values of many of these cells are dependent on the values of other cells in the data cube. A common and powerful query optimization technique is to materialize some or all of these cells rather than compute them from raw data each time. Commercial systems differ mainly in their approach to materializing the data cube. In this paper, we investigate the issue of which cells (views) to materialize when it is too expensive to materialize all views. A lattice framework is used to express dependencies among views. We then present greedy algorithms that work off this lattice and determine a good set of views to materialize. The greedy algorithm performs within a small constant...