Chapter

Concept-Oriented Query Language for Data Modeling and Analysis

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

In the paper we describe a novel query language, called the concept-oriented query language (COQL), and demonstrate how it can be used for data modeling and analysis. The query language is based on a novel construct, called concept, and two relations between concepts, inclusion and partial order. Concepts generalize conventional classes and are used for describing domain-specific identities. Inclusion relation generalized inheritance and is used for describing hierarchical address spaces. Partial order among concepts is used to define two main operations: projection and de-projection. We demonstrate how these constructs are used to solve typical tasks in data modeling and analysis such as logical navigation, multidimensional analysis and inference.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Both patterns can be used in queries for finding related elements without classical joins: This duality of joins and relationships is quite important for understanding the nature of connectivity. In particular, the connectivity via lesser elements (relationships or dependencies) provides a basis for the mechanism of inference in multidimensional space [27, 31]. It is an important mechanism because it allows for going beyond numeric analysis and doing in multidimensional space what has always been a prerogative of logic-based models. ...
... In this example (Fig. 7), the system builds the propagation path from Suppliers down to SP and then up to Parts. Such paths have clearer semantics and less ambiguity [31, 27]. Why partial order? ...
Preprint
Full-text available
The plethora of existing data models and specific data modeling techniques is not only confusing but leads to complex, eclectic and inefficient designs of systems for data management and analytics. The main goal of this paper is to describe a unified approach to data modeling, called the concept-oriented model (COM), by using functions as a basis for its formalization. COM tries to answer the question what is data and to rethink basic assumptions underlying this and related notions. Its main goal is to unify major existing views on data (generality), using only a few main notions (simplicity) which are very close to how data is used in real life (naturalness).
... COM (2009a) is a general purpose unified data model aimed at describing many existing views and patterns of thoughts currently used in data modeling. As a unified model, its main goal is to significantly decrease differences and incongruities between various approaches to data modeling such as transactional, analytical (Savinov, 2011b), multidimensional (Savinov, 2005a), object-oriented (Savinov, 2011a), conceptual and semantic (Savinov, 2012c). The motivation behind COM and its practical benefits are similar to those for the Business Intelligence Semantic Model (BISM) (Russo, Ferrari, & Webb, 2012) introduced in Microsoft SQL Server 2012. ...
... Syntactically, COM is described by the concept-oriented query language (COQL) (Savinov, 2011b). This language reflects the principles of COM by introducing a novel data modeling construct, called concept (hence the name of the approach), and two relations among concepts, inclusion and partial order. ...
Chapter
Full-text available
... Inverse arrow is opposite to dot notation and we use it [13,14] because dot symbol does not have an inversion. ...
Preprint
Full-text available
We describe a new logical data model, called the concept-oriented model (COM). It uses mathematical functions as first-class constructs for data representation and data processing as opposed to using exclusively sets in conventional set-oriented models. Functions and function composition are used as primary semantic units for describing data connectivity instead of relations and relation composition (join), respectively. Grouping and aggregation are also performed by using (accumulate) functions providing an alternative to group-by and reduce operations. This model was implemented in an open source data processing toolkit examples of which are used to illustrate the model and its operations. The main benefit of this model is that typical data processing tasks become simpler and more natural when using functions in comparison to adopting sets and set operations.
... The main challenge is to unify programming with data modeling and querying. In particular, the goal is to further develop the concept-oriented query language (Savinov, 2005b;Savinov, 2006;Savinov, 2011b) in the direction of general-purpose programming languages. Identity modeling in COM is analogical to COP and relies on concepts and inclusion hierarchy. ...
Preprint
Full-text available
For the past several decades, programmers have been modeling things in the world with trees using hierarchies of classes and object-oriented programming (OOP) languages. In this paper, we describe a novel approach to programming, called concept-oriented programming (COP), which generalizes classes and inheritance by introducing concepts and inclusion, respectively.
... COP is an integral part of a novel general-purpose data model, call concept-oriented model (COM) [Sav09b,Sav11b,Sav12b,Sav14b], and the corresponding concept-oriented query language [Sav11a,Sav14a]. Shortly, COM can be viewed as COP plus partial order relation among objects. ...
Preprint
Full-text available
The main goal of concept-oriented programming (COP) is describing how objects are represented and accessed. It makes references (object locations) first-class elements of the program responsible for many important functions which are difficult to model via objects. COP rethinks and generalizes such primary notions of object-orientation as class and inheritance by introducing a novel construct, concept, and a new relation, inclusion. An advantage is that using only a few basic notions we are able to describe many general patterns of thoughts currently belonging to different programming paradigms: modeling object hierarchies (prototype-based program-ming), precedence of parent methods over child methods (inner methods in Beta), modularizing cross-cutting con-cerns (aspect-oriented programming), value-orientation (functional programming). Since COP remains backward compatible with object-oriented programming, it can be viewed as a perspective direction for developing a simple and natural unified programming model.
... In the concept-oriented query language (COQL) (Savinov, 2014a(Savinov, , 2011b, a set of elements is written in parentheses with constraints separated by bar symbol. For example, (Books | price < 10) is a set of all cheap books. ...
Preprint
Full-text available
In spite of its fundamental importance, inference has not been an inherent function of multidimensional models and analytical applications. These models are mainly aimed at numeric (quantitative) analysis where the notions of inference and semantics are not well defined. In this paper we argue that inference can be and should be integral part of multidimensional data models and analytical applications. It is demonstrated how inference can be defined using only multidimensional terms like axes and coordinates as opposed to using logic-based approaches. We propose a novel approach to inference in multidimensional space based on the concept-oriented model of data and introduce elementary operations which are then used to define constraint propagation and inference procedures. We describe a query language with inference operator and demonstrate its usefulness in solving complex analytical tasks.
... The concept-oriented query language (COQL) (Savinov, 2014a; 2012a; 2011b) is a syntactic embodiment of COM. ConceptMix uses a modified version of this language, called the concept-oriented expression language (COEL), the purpose of which is similar to that of the Microsoft Data Analysis Expressions (DAX) (Russo, Ferrari & Webb, 2012). ...
Conference Paper
Full-text available
Data integration as well as other data wrangling tasks account for a great deal of the difficulties in data analysis and frequently constitute the most tedious part of the overall analysis process. We describe a new system, ConceptMix, which radically simplifies analytical data integration for a broad range of non-IT users who do not possess deep knowledge in mathematics or statistics. ConceptMix relies on a novel unified data model, called the concept-oriented model (COM), which provides formal background for its functionality.
... A data model can be defined in two major ways: syntactically as a language and mathematically as some formal setting. Earlier, COM has been defined using its concept-oriented query language (COQL) (Savinov, 2006a(Savinov, , 2011a(Savinov, , 2012a(Savinov, , 2014a which can be viewed as its syntactic embodiment. This language is based on a novel construct, called concept (hence the name of the model), which generalizes conventional classes and is used for modeling data types. ...
Technical Report
Full-text available
Concept-oriented model of data (COM) has been recently defined syntactically by means of the concept-oriented query language (COQL). In this paper we propose a formal embodiment of this model, called nested partially ordered sets (nested posets), and demonstrate how it is connected with its syntactic counterpart. Nested poset is a novel formal construct that can be viewed either as a nested set with partial order relation established on its elements or as a conventional poset where elements can themselves be posets. An element of a nested poset is defined as a couple consisting of one identity tuple and one entity tuple. We formally define main operations on nested posets and demonstrate their usefulness in solving typical data management and analysis tasks such as logic navigation, constraint propagation, inference and multidimensional analysis.
... In this article we describe a novel query language, called the concept-oriented query language (COQL), which addresses the above issues and is aimed at radically simplifying typical data analysis and data modeling tasks. COQL is a syntactic description of the concept-oriented model (COM) (Savinov, 2009(Savinov, , 2011 and it has the following distinguishing features:  COQL replaces joins as a means of connectivity by a novel arrow notation which can be viewed as a set-oriented analog of dot notation  COQL replaces group-by operation by a novel operation of de-projection  COQL introduces a novel mechanism of inference based on the multidimensional structure of data instead of using logical inference  COQL inherently supports dimensions as a basic construct rather than treating them as something optional that is added for specific kinds of analysis  COM and COQL support several data modeling and analysis paradigms (relational, multidimensional, entity-relationship, semantic and conceptual, object-oriented) by resolving many incompatibilities and controversies as well as increasing semantic integrity of data models and analysis tasks  COQL relies on a novel data typing construct, called concept, and two relations: inclusion and partial order. ...
... Recently, a number of papers have been published [26] [27] [28] [32] [33] [34] which describe either preliminary results or specific mechanisms of COM with the focus on query and analysis tasks. This paper focuses mainly on conceptual data modeling, data semantics and type modeling. ...
Article
Full-text available
We present the concept-oriented model (COM) and demonstrate how its three main structural principles — duality, inclusion and partial order — naturally account for various typical data modeling issues. We argue that elements should be modeled as identity-entity couples and describe how a novel data modeling construct, called concept, can be used to model simultaneously two orthogonal branches: identity modeling and entity modeling. We show that it is enough to have one relation, called inclusion, to model value extension, hierarchical address spaces (via reference extension), inheritance and containment. We also demonstrate how partial order relation represented by references can be used for modeling multidimensional schemas, containment and domain-specific relationships.
... The approach to inference described in this paper relies on a novel unified model, called the conceptoriented model (COM) (Savinov, 2011a; Savinov, 2011b; Savinov, 2012). One of the main principles of COM is that an element consists of two tuples: one identity tuple and one entity tuple. ...
Conference Paper
Full-text available
In spite of its fundamental importance, inference has not been an inherent function of multidimensional models and analytical applications. These models are mainly aimed at numeric analysis where the notion of inference is not well defined. In this paper we define inference using only multidimensional terms like axes and coordinates as opposed to using logic-based approaches. We propose an inference procedure which is based on a novel formal setting of nested partially ordered sets with operations of projection and de-projection.
... Another difficulty is that join is a set-oriented operation while references are instance-oriented and this is why references are not so popular in data modeling. As a reference-based solution to the problem of joins, we describe a novel approach to data modeling, called the concept-oriented model (COM) [8,9,10], which generalizes references. In particular, it allows for modeling domain-specific references which replace primary keys. ...
Article
Full-text available
We study properties of the join operation in query languages and describe some of its major drawbacks. We provide strong arguments against using joins as a main construct for retrieving related data elements in general purpose query languages and argue for using references instead. Since conventional references are quite restrictive when applied to data modeling and query languages, we propose to use generalized references as they are defined in the concept-oriented model (COM). These references are used by two new operations, called projection and de-projection, which are denoted by right and left arrows and therefore this access method is referred to as arrow notation. We demonstrate advantages of the arrow notation in comparison to joins and argue that it makes queries simpler, more natural, easier to understand, and the whole query writing process more productive and less error-prone.
... The concept-oriented model (COM) is an emerging general-purpose approach to data modeling. It is aimed at unifying different views on data and solving a wide spectrum of problems in data modeling and analysis [31, 33]. COM overlaps with many existing data modeling methodologies but perhaps most of its features are shared with object data models (ODM) [10, 4, 3]. ...
Article
Full-text available
The concept-oriented data model (COM) is an emerging approach to data modeling which is based on three novel principles: duality, inclusion and order. These three structural principles provide a basis for modeling domain-specific identities, object hierarchies and data semantics. In this paper these core principles of COM are presented from the point of view of object data models (ODM). We describe the main data modeling construct, called concept, as well as two relations in which it participates: inclusion and partial order. Concepts generalize conventional classes by extending them with identity class. Inclusion relation generalizes inheritance by making objects elements of a hierarchy. We discuss what partial order is needed for and how it is used to solve typical data analysis tasks like logical navigation, multidimensional analysis and reasoning about data.
Conference Paper
Full-text available
In the paper the concept-oriented data model (COM) is described from the point of view of its hierarchical and multidimensional properties. The model consists of two levels: syntactic and semantic. At the syntactic level each element is defined as a combination of its superconcepts. At the semantic level each item is defined as a combination of its superitems. Such a definition has several general interpretations such as a hierarchical coordinate system or multidimensional categorization schema. The described approach can be applied to very different problems for dimensional modelling including database systems, knowledge based systems, ontologies, complex categorizations, knowledge sharing and semantics web.
Article
Full-text available
The paper describes logical navigation in the concept-oriented data model. This model explicitly and formally separates physical structure and logical structure so that each element of the model is simultaneously a collection and a combination of other elements. The physical structure is used to representing and access by elements by means of references. The logical structure is used to reflect the problem domain dependencies. The two-level model considered in the paper consists of a set of concepts and a set of items. Concept structure defines the model syntax while item structure defines its semantics. In the paper it is shown how the properties of the model can be used for logical navigation where we do not need to specify join conditions or other complicated parameters of queries.
Article
Full-text available
Semantic data models have emerged from a requirement for more expressive conceptual data models. Current generation data models lack direct support for relationships, data abstraction, inheritance, constraints, unstructured objects, and the dynamic properties of an application. Although the need for data models with richer semantics is widely recognized, no single approach has won general acceptance. This paper describes the generic properties of semantic data models and presents a representative selection of models that have been proposed since the mid-1970s. In addition to explaining the features of the individual models, guidelines are offered for the comparison of models. The paper concludes with a discussion of future directions in the area of conceptual data modeling.
Conference Paper
Full-text available
In the paper we describe the problem of grouping and aggregation in the concept-oriented data model. The model is based on ordering its elements within a hierarchical multidimensional space. This order is then used to define all its main properties and mechanisms. In particular, it is assumed that elements positioned higher are interpreted as groups for their lower level elements. Two operations of projection and de-projection are defined for one-dimensional and multidimensional cases. It is demonstrated how these operations can be used for multidimensional analysis.
Article
Full-text available
. The design of inheritance and encapsulation in SELF, an object-oriented language based on prototypes, results from understanding that inheritance allows parents to be shared parts of their children. The programmer resolves ambiguities arising from multiple inheritance by prioritizing an object's parents. Unifying unordered and ordered multiple inheritance supports differential programming of abstractions and methods, combination of unrelated abstractions, unequal combination of abstractions, and mixins. In SELF, a private slot may be accessed if the sending method is a shared part of the receiver, allowing privileged communication between related objects. Thus, classless SELF enjoys the benefits of class-based encapsulation. 1 Introduction Inheritance is a basic feature of most object-oriented languages. Many of these languages are based on classes and use inheritance to allow a class to obtain methods and instance variables [26]. (Sometimes classes and inheritance are also used to ...
Article
Full-text available
This survey paper discusses the facilities provided by hierarchical data-base management systems. The systems are based on the hierarchical data model which is defined as a special case of the network data model. Different methods used to access hierarchically organized data are outlined. Constructs and examples of programming languages are presented to illustrate the features of hierarchical systems. This is followed by a discussion of techniques for implementing such systems. Finally, a brief comparison is made between the hierarchical, the network, and the relational systems.
Article
Full-text available
The paper describes an approach to query processing in the concept-oriented data model. This approach is based on imposing constraints and specifying the result type. The constraints are then automatically propagated over the model and the result contains all related data items. The simplest constraint propagation strategy consists of two steps: propagating down to the most specific level using de-projection and propagating up to the target concept using projection. A more complex strategy described in the paper may consist of many de-projection/projection steps passing through some intermediate concepts. An advantage of the described query mechanism is that it does not need any join conditions because it uses the structure of the model for propagation. Moreover, this mechanism does not require specifying an access path using dimension names. Thus even rather complex queries can be expressed in simple and natural form because they are expressed by specifying what information is available and what related data we want to get.
Article
Full-text available
In the paper we introduce a new programming language construct, called concept, which is defined as a pair of two classes: one reference class and one object class. Instances of the reference class are passed-by-value and are intended to indirectly represent objects. Instances of the object class are passed-by-reference. Each concept has a parent concept specified by means of the concept inclusion relation. This approach where concepts are used instead of classes is referred to as concept-oriented programming (CoP). CoP is intended to generalize object-oriented programming (OOP). Particularly, concepts generalize conventional classes and concept inclusion generalizes class inheritance in OOP. This approach allows the programmer to describe not only objects but also references which are made integral and completely legal part of the program. Program objects at run-time exist within a virtual hierarchal address space and CoP provides means to effectively design such a space for each concrete problem domain.
Article
Full-text available
Multidimensional database technology is a key factor in the interactive analysis of large amounts of data for decision making purposes. In contrast to previous technologies, these databases view data as multidimensional cubes that are particularly well suited for data analysis. Multidimensional models categorize data either as facts with associated numerical measures or as textual dimensions that characterize the facts. Queries aggregate measure values over a range of dimension values to provide results such as total sales per month of a given product. Multidimensional database technology is being applied to distributed data and to new types of data that current technology often cannot adequately analyze. For example, classic techniques such as preaggregation cannot ensure fast query response times when data-such as that obtained from sensors or GPS-equipped moving objects-changes continuously. Multidimensional database technology will increasingly be applied where analysis results are fed directly into other systems, thereby eliminating humans from the loop. When coupled with the need for continuous updates, this context poses stringent performance requirements not met by current technology
Article
Full-text available
Sound naming schemes for objects are crucial in many parts of computer science, such as database modeling, database implementation, distributed and federated databases, and networked and distributed operating systems. Over the past 20 years, physical pointers, keys, surrogates and object identifiers have been used as naming schemes in database systems and elsewhere. However, there are some persistent confusions about the nature, applicability and limits of these schemes. In this paper we give a detailed comparison of three naming schemes, viz. object identifiers, internal identifiers (often called surrogates) and keys. We discuss several ways in which identification schemes can be implemented, and show what the theoretical and practical limits of applicability of identification schemes are, independently from how they are implemented. In particular, we discuss problems with the recognition and authentication of identifiers. If the identified objects are persons, an additional problem is that object identification may conflict with privacy demands; for this case, we indicate a way in which identification can be combined with privacy protection.
Article
Full-text available
We present a multi-dimensional database model, which we believe can serve as a conceptual model for On-Line Analytical Processing (OLAP)-based applications. Apart from providing the functionalities necessary for OLAP-based applications, the main feature of the model we propose is a clear separation between structural aspects and the contents. This separation of concerns allows us to define data manipulation languages in a reasonably simple, transparent way. In particular, we show that the data cube operator can be expressed easily. Concretely, we define an algebra and a calculus and show them to be equivalent. We conclude by comparing our approach to related work. The conceptual multi-dimensional database model developed here is orthogonal to its implementation, which is not a subject of the present paper. 1 Introduction Currently, there is significant interest in multidimensional database systems for developing business analysis and decision support applications. Cod...
Article
Identity is that property of an object which distinguishes each object from all others. Identity has been investigated almost independently in general-purpose programming languages and database languages. Its importance is growing as these two environments evolve and merge. We describe a continuum between weak and strong support of identity, and argue for the incorporation of the strong notion of identity at the conceptual level in languages for general purpose programming, database systems and their hybrids. We define a data model that can directly describe complex objects, and show that identity can easily be incorporated in it. Finally, we compare different implementation schemes for identity and argue that a surrogate-based implementation scheme is needed to support the strong notion of identity.
Article
A database system is a collection of stored data together with their description (the database) and a hardware/software system for their reliable and secure management, modification and retrieval (the database management system, DBMS).A database is supposed to represent the interesting semantics of an application (the miniworld) as completely and accurately as possible. The data model incorporated into a database system defines a framework of concepts that can be used to express the miniworld semantics. It comprisesbasic data types and constructors for composed data types,(generic) operators to insert, manipulate, retrieve and delete instances of the actual data types of a database,implicit consistency constraints as well as (eventually) mechanisms for the definition of explicit consistency constraints that further reflect the miniworld semantics as viewed by the database system.As usual, types have to be defined before instances of them can be created (the collection of defined types — sometimes together with the set of explicit consistency constraints — forms the database schema). Every database thus adheres to the schema defined for it, and both together, the schema and the actual data provided by the users (and stored in instances) capture the miniworld semantics.We can therefore distinguish the following two classes of semantics:the semantics of the miniworld itself, the semantics of the miniworld as represented within the database. Let us assume that a database correctly reflects the intended miniworld semantics (careful database design!). Due to the rigid framework of data models, there will still remain a semantic gap between the miniworld and its database representation. In other words, it is usually impossible to represent all interesting semantics within a database. The “remainder” has to be captured by the application programs using the database and/or it is part of the (hopefully meaningful!) interpretation of the result of database queries by the user himself. However, the ultimate goal of database systems is to provide for concepts that allow to keep the semantic gap as small as possible and thus permit to represent most of the salient semantics in the database itself.
Conference Paper
The Data Base Management System is now a well established part of information systems technology, but the many architectures and their plethora of data models are confusing to both the practitioner and researcher. In the past, attempts have been made to compare and contrast some of these systems, but the greatest difficulty arises in seeking a common basis. This paper attempts to show how a generalized data system (GDS), represented by two different models, could form such a basis; it then proposes that data policy definitions can restrict the GDS to a specialized model, such as a relational or DBTG-like model. Finally, it proposes that this concept forms a better basis for data structure design of specific system applications.
Conference Paper
Inheritance and delegation are alternate methods for incremental definition and sharing. It has commonly been believed that delegation provides a more powerful model. This paper demonstrates that there is a “natural” model of inheritance which captures all of the properties of delegation. Independently, certain constraints on the ability of delegation to capture inheritance are demonstrated. Finally, a new framework which fully captures both delegation and inheritance is outlined, and some of the ramifications of this hybrid model are explored.
Article
Most common database management systems represent information in a simple record-based format. Semantic modeling provides richer data structuring capabilities for database applications. In particular, research in this area has articulated a number of constructs that provide mechanisms for representing structurally complex interrelations among data typically arising in commercial applications. In general terms, semantic modeling complements work on knowledge representation (in artificial intelligence) and on the new generation of database models based on the object-oriented paradigm of programming languages. This paper presents an in-depth discussion of semantic data modeling. It reviews the philosophical motivations of semantic models, including the need for high-level modeling abstractions and the reduction of semantic overloading of data type constructors. It then provides a tutorial introduction to the primary components of semantic models, which are the explicit representation of objects, attributes of and relationships among objects, type constructors for building complex types, ISA relationships, and derived schema components. Next, a survey of the prominent semantic models in the literature is presented. Further, since a broad area of research has developed around semantic modeling, a number of related topics based on these models are discussed, including data languages, graphical interfaces, theoretical investigations, and physical implementation strategies.
Article
Data model transparency can be achieved by providing a canonical language format for the definition and seamless manipulation of multiple autonomous information bases. In this paper we assume a canonical data and computational model combining the function and object-oriented paradigms. We investigate the concept of identity as a property of an object and the various ways this property is supported in existing databases, in relation to the object-oriented canonical data model. The canonical data model is the tool for combining and integrating preexisting syntactical homogeneous, but semantical heterogeneous data types into generalized unifying data types. We identify requirements for object identity in federated systems, and discuss problems of object identity and semantical object replication arising from this new abstraction level. We argue that a strong notion of identity at the federated level can only be acheived by weakening strict autonomy requirements of the component information bases. Finally we discuss various solutions to this problem that differ in their requirements with repect to giving up autonomy.
Article
DAPLEX is a database language which incorporates: This paper presents and motivates the DAPLEX language and the underlying data model on which it is based.
Article
Two kinds of abstraction that are fundamentally important in database design and usage are defined. Aggregation is an abstraction which turns a relationship between objects into an aggregate object. Generalization is an abstraction which turns a class of objects into a generic object. It is suggested that all objects (individual, aggregate, generic) should be given uniform treatment in models of the real world. A new data type, called generic, is developed as a primitive for defining such models. Models defined with this primitive are structured as a set of aggregation hierarchies intersecting with a set of generalization hierarchies. Abstract objects occur at the points of intersection. This high level structure provides a discipline for the organization of relational databases. In particular this discipline allows: (i) an important class of views to be integrated and maintained; (ii) stability of data and programs under certain evolutionary changes; (iii) easier understanding of complex models and more natural query formulation; (iv) a more systematic approach to database design; (v) more optimization to be performed at lower implementation levels. The generic type is formalized by a set of invariant properties. These properties should be satisfied by all relations in a database if abstractions are to be preserved. A triggering mechanism for automatically maintaining these invariants during update operations is proposed. A simple mapping of aggregation/generalization hierarchies onto owner-coupled set structures is given.
Article
During the last three or four years several investigators have been exploring “semantic models” for formatted databases. The intent is to capture (in a more or less formal way) more of the meaning of the data so that database design can become more systematic and the database system itself can behave more intelligently. Two major thrusts are clear. In this paper we propose extensions to the relational model to support certain atomic and molecular semantics. These extensions represent a synthesis of many ideas from the published work in semantic modeling plus the introduction of new rules for insertion, update, and deletion, as well as new algebraic operators.
Article
Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation). A prompting service which supplies such information is not a satisfactory solution. Activities of users at terminals and most application programs should remain unaffected when the internal representation of data is changed and even when some aspects of the external representation are changed. Changes in data representation will often be needed as a result of changes in query, update, and report traffic and natural growth in the types of stored information. Existing noninferential, formatted data systems provide users with tree-structured files or slightly more general network models of the data. In Section 1, inadequacies of these models are discussed. A model based on n -ary relations, a normal form for data base relations, and the concept of a universal data sublanguage are introduced. In Section 2, certain operations on relations (other than logical inference) are discussed and applied to the problems of redundancy and consistency in the user's model.
Article
Thesis (Ph. D.)--University of Waterloo, 1996. Includes bibliographical references.
Article
We demonstrate the power of object identities (oids) as a database query language primitive. We develop an object-based data model, whose structural part generalizes most of the known complex-object data models: cyclicity is allowed in both its schemas and instances. Our main contribution is the operational part of the data model, the query language IQL, which uses olds for three critical purposes: (1) to represent data-structures with sharing and cycles, (2) to manipulate sets, and (3) to express any computable database query. IQL can be type checked, can be evaluated bottom-up, and naturally generalizes most popular rule-based languages. The model can also be extended to incorporate type inheritance, without changes to IQL. Finally, we investigate an analogous value-based data model, whose structural part is founded on regular infinite trees and whose operational part is IQL.
Conference Paper
The authors propose a data model and a few algebraic operations that provide semantic foundation to multidimensional databases. The distinguishing feature of the proposed model is the symmetric treatment not only of all dimensions but also measures. The model provides support for multiple hierarchies along each dimension and support for ad hoc aggregates. The proposed operators are composable, reorderable, and closed in application. These operators are also minimal in the sense that none can be expressed in terms of others nor can any one be dropped without sacrificing functionality. They make possible the declarative specification and optimization of multidimensional database queries that are currently specified operationally. The operators have been designed to be translated to SQL and can be implemented either on top of a relational database system or within a special purpose multidimensional database engine. In effect, they provide an algebraic application programming interface (API) that allows the separation of the front end from the back end. Finally, the proposed model provides a framework in which to study multidimensional databases and opens several new research problems
Article
A traditional philosophical controversy between representing general concepts as abstract sets or classes and representing concepts as concrete prototypes is reflected in a controversy between two mechanisms for sharing behavior between objects in object oriented programming languages. Inheritance splits the object world into classes, which encode behavior shared among a group of instances, which represent individual members of these sets. The class/instance distinction is not needed if the alternative of using prototypes is adopted. A prototype represents the default behavior for a concept, and new objects can re-use part of the knowledge stored in the prototype by saying how the new object differs from the prototype. The prototype approach seems to hold some advantages for representing default knowledge, and incrementally and dynamically modifying concepts. Delegation is the mechanism for implementing this in object oriented languages. After checking its idiosyncratic behavior, an ob...
Article
A database application, called "on-line analytical processing" (or OLAP) and aimed at providing business intelligence through on-line multidimensional data analysis, has become increasingly important due to the existence of huge amounts of on-line data. This paper formalizes a multidimensional data (MDD) model for OLAP, and develops an algebraic query language called grouping algebra. The basic component of the MDD model is a multidimensional cube, consisting of a number of relations (called dimensions) and for each combination of tuples (called a coordinate), one from each dimension, there is an associated data value. Each dimension is viewed as a basic grouping, i.e., each tuple in the dimension corresponds to the group consisting of all the coordinates that contain this tuple. In order to express user queries, relational algebra expressions are then extended to those on basic groupings for obtaining complex groupings, including orderoriented groupings (for expressing, e.g., cumula...