ArticlePDF Available

The Universal-Relation Data Model for Logical Independence

Authors:

Abstract

The universal-relation model is introduced, and its fundamental ideas are outlined. This model keeps access-path independence by removing the need for logical navigation among relations. One benefit is a simple yet powerful query-language interface. Two universal-relation database-management systems are discussed that provide good examples of the model: System/U, developed at Stanford University, and FIDL, developed at International Computers Ltd. in Britain.
... Par ailleurs, la remarque sur les types de schémas de données de ces systèmes opérationnels, anodine au départ, a révelé beaucoup de spécificités qu'il est intéressant d'exploiter. En effet, les schémas de données qui réunissent, en une table unique, l'ensemble des caractéristiques ou des attributs du monde réel, sont appelés relations universelles (RU) [67,129,195]. Une RU contient de nombreuses propriétés qui sont utilisées dans plusieurs travaux scientifiques, notamment les systèmes d'information [99,119,128]. ...
... Une relation universelle est une table unique dont le schéma est obtenu par union de tous les attributs des tables constituant la base [127,130,195]. Clouse, dans [48] considère la relation universelle comme la composition de tous les attributs nécessaires à la viabilité d'un système d'information. ...
... 195] et Leymann[128] énumèrent un certain nombre de systèmes qui implémente les relations universelles. Nous avons, entre autres, PITS (Pie-In-The-Sky database system) Query language développé à l'Université d'Etat de New-York ; AURICAL (A Universal-Relation Implementation via CodAsyL), développé à l'Université Illinois ; DURST (Datenbank mit Universaler Relationen SchnittsTelle) de l'Université de Dortmund ; System/U développé à l'Université de Stanford ; Maximal Object+, An Acyclic Semantic Structure on the Universal Relation Model ; le système q de AT&T Bell Laboratories ; FIDL (Flexible Interrogation and Declaration Language) de l'International Computers Ltd. en Grande-Brétagne. ...
Thesis
Full-text available
Data produced by various business activities are becoming increasingly important in terms of quantity. These data are from multiple sources and formats types ; produced more frequently so that they are called big data. Decision support systems (DSS) improve transactional ones in terms of integrating heterogeneous data, system interoperability, structuring and data mining. This research work models DSS from Entity-Relationship (ER) schema. The ER data schema, provided as input, is transformed into a single entity, following the universal relations assumption. Using this entity, algorithms and guidelines are used to determine the multidimensional elements (dimensions, hierarchies, facts and measures). The data schema is generated using a multidimensional and spatio-temporal design pattern and model transformations. Model driven engineering is used to develop the DSS. The developed system is fed from operational ones and the data that populate them. The main contributions are a supply-driven approach to design DSS and a model driven architecture to develop and feed the system with data. The proposed approaches are intended for use in areas where spatial data are managed. Urban data and systems are used for applying the results.
... are merged by placing their tuples into a single universal table D possibly with missing values (under certain assumptions discussed in Vardi (1988)); then all functional dependencies are applied on D through the well known chase algorithm (Fagin et al., 1982;Ullman, 1988). If the algorithm terminates successfully (i.e., no inconsistency is detected) then the database is consistent; otherwise the algorithm stops when a first inconsistency is detected and the database is declared inconsistent. ...
Article
Full-text available
In this paper we address the problem of handling inconsistencies in tables with missing values (also called nulls) and functional dependencies. Although the traditional view is that table instances must respect all functional dependencies imposed on them, it is nevertheless relevant to develop theories about how to handle instances that violate some dependencies. Regarding missing values, we make no assumptions on their existence: a missing value exists only if it is inferred from the functional dependencies of the table. We propose a formal framework in which each tuple of a table is associated with a truth value among the following: true, false, inconsistent or unknown; and we show that our framework can be used to study important problems such as consistent query answering or data quality measures - to mention just two. In this paper, however, we focus mainly on consistent query answering, a problem that has received considerable attention during the last decades. The main contributions of the paper are the following: (a) we introduce a new approach to handle inconsistencies in a table with nulls and functional dependencies, (b) we give algorithms for computing all true, inconsistent and false tuples, and (c) we give a novel solution to the consistent query answering problem and compare our solution to that of table repairs.
... As a last example, in a relational database, although each table satisfies its functional dependencies, the database as a whole may be inconsistent. To determine whether the database is consistent one merges all tables into a single table D (under certain assumptions discussed in [22]) and applies all functional dependencies on D through the well known chase algorithm [11,21]. If the algorithm terminates successfully (i.e., no inconsistency is detected) then the database is consistent; otherwise the algorithm stops and the database is inconsistent. ...
Preprint
Full-text available
In this paper we address the problem of handling inconsistencies in tables with missing values (or nulls) and functional dependencies. Although the traditional view is that table instances must respect all functional dependencies imposed on them, it is nevertheless relevant to develop theories about how to handle instances that violate some dependencies. The usual approach to alleviate the impact of inconsistent data on the answers to a query is to introduce the notion of repair: a repair is a minimally di?erent consistent instance and an answer is consistent if it is present in every repair. Our approach is fundamentally di?erent: we use set theoretic semantics for tuples and functional dependencies that allow us to associate each tuple with a truth value among the following: true, false, inconsistent or unknown. The users of the table can then query the set of true tuples as usual. Regarding missing values, we make no assumptions on their existence: a missing value exists only if it is inferred from the functional dependencies of the table. The main contributions of the paper are the following: (a) we introduce a new approach to handle inconsistencies in a table with nulls and functional dependencies, (b) we give algorithms for computing all true, inconsistent and false tuples, (c) we discuss how our approach relates to Belnap's four valued logic, (d) we de- scribe how our approach can be applied to the consolidation of two or more tables and (e) we discuss the relationship between our approach and that of table repairs.
... Two examples follow, Disentanglement and Vector Disentanglement A recent line of AI research argues that "representations that are disentangled are an important step towards a better representation learning where disentanglement mans that they should contain all the information present in x in a compact and interpretable structure while being independent from the task at hand" [48]. In software engineering there is a long tradition of disentanglement which can be called either decoupling , [49] or logical independence [50] They are both well established and considered essential good practices in system design. While affording flexibility of data manipulation in experiments for reasoning and computation [51], in logical functional viable systems, disentangled representations can lead to functional incoherence, and result in various types of malfunctions, or dysfunction. ...
... It begins by verifying if the schema supplied as input is on universal relation form. If not, it can be transformed so as to get one [11,14,15,23]. After this stage, we split the universal relation into non-empty and disjoints sub-sets. ...
Article
Full-text available
Various information systems have been developed for decision support. But, they rely essentially on transactional methods. From data and transactional databases, we proposed a supply-driven approach to design data warehouses. The approach takes as input, a universal relation, applies vertical partitioning by a greedy type heuristic algorithm. Partitions obtained are transformed into dimensions using a matching algorithm. The other elements of the multidimensional annotation are deduced by guidelines, and the data warehouse schema is generated using a multidimensional conceptual pattern. The transformation of those transactional systems into decision support ones aims at facilitating the storage, exploitation and the representation of data using new databases generation technologies.
... Database researchers have proposed methods and systems to help users specify their queries more precisely and database query interfaces understand users' intents more accurately [26,17,8,28,18,15,5]. In particular, the database community has deeply investigated some problems that appear in the context of database usability [37,14,20,13,2,6,29]. Current models mainly focus on improving user satisfaction for a single information need. ...
Article
Full-text available
As most database users cannot precisely express their information needs, it is challenging for database querying and exploration interfaces to understand them. We propose a novel formal framework for representing and understanding information needs in database querying and exploration. Our framework considers querying as a collaboration between the user and the database system to establish a mutual language for representing information needs. We formalize this collaboration as a signaling game, where each mutual language is an equilibrium for the game. A query interface is more effective if it establishes a less ambiguous mutual language faster. We discuss some equilibria, strategies, and the convergence in this game. In particular, we propose a reinforcement learning mechanism and analyze it within our framework. We prove that this adaptation mechanism for the query interface improves the effectiveness of answering queries stochastically speaking, and converges almost surely. Most importantly, we show that the proposed learning rule is robust to the choice of the metric/reward by the database.
Article
We present a new query formulation interface called FFQI (Fast Formulation Query Interface) which is based on a semantic graph model. The query formulator allows the users with limited IT skills to query and explore the data source easily and efficiently. Here the user inputs are formulated based on the graph search algorithm by using the probabilistic popularity measure. The query ambiguity has been resolved through the ranking technique. We formulated the SELECT-PROJECT-JOIN queries using the aggregate functions. In additional to that we also implemented a formulation technique for image databases. Thus this interface allows user to interact with relational graph-type databases in an effective and easier way.
Article
Many industries, such as telecom, health-care, retail, pharmaceutical, financial services, etc., generate large amounts of data. Such large amount of data needs to be processed quickly for gaining critical business insights. The data warehouses and solutions built around them are unable to provide reasonable response times in handling expanding data volumes. One can either perform analytics on big volume once in days or one can perform transactions on small amounts of data in seconds. With the new requirements, one needs to ensure the real-time or near real-time response for huge amount of data. In this chapter we cover various important aspects of analyzing big data. We start with challenges one needs to overcome for moving data and data management applications. over cloud. For big data we describe two kinds of systems: (1) NoSQL systems for interactive data serving environments; and (2) systems for large scale analytics based on MapReduce paradigm, such as Hadoop, The NoSQL systems are designed to have a simpler key-value-based data model having inbuilt sharding, hence, these work seamlessly in a distributed cloud-based environment. In contrast, one can use Hadoop-based systems to run long running decision support and analytical queries consuming and possible producing bulk data. We illustrate various middleware and applications which can use these technologies to quickly process massive amount of data.
Conference Paper
Many industries, such as telecom, health care, retail, pharmaceutical, financial services, etc., generate large amounts of data. Gaining critical business insights by querying and analyzing such massive amounts of data is becoming the need of the hour. The warehouses and solutions built around them are unable to provide reasonable response times in handling expanding data volumes. One can either perform analytics on big volume once in days or one can perform transactions on small amounts of data in seconds. With the new requirements, one needs to ensure the real-time or near real-time response for huge amount of data. In this paper we outline challenges in analyzing big data for both data at rest as well as data in motion. For big data at rest we describe two kinds of systems: (1) NoSQL systems for interactive data serving environments; and (2) systems for large scale analytics based on MapReduce paradigm, such as Hadoop, The NoSQL systems are designed to have a simpler key-value based data model having in-built sharding, hence, these work seamlessly in a distributed cloud based environment. In contrast, one can use Hadoop based systems to run long running decision support and analytical queries consuming and possible producing bulk data. For processing data in motion, we present use-cases and illustrative algorithms of data stream management system (DSMS). We also illustrate applications which can use these two kinds of systems to quickly process massive amount of data.
Article
The universal relation model is an attempt to achieve a very simple user interface by allowing the formulation of queries with respect to a single, imaginary relation (i.e., universal relation) defined over all the attributes appearing in the various relations of a database, while producing answers from the actual relations of the database. Previous publications covering the universal relation model have been too technical to be easily understood, and have not offered a solution to the ambiguity problem for which the universal relation model has achieved much criticism. This paper provides an accessible introduction to the universal relation model, which helps to clarify the misunderstanding of the universal relation model. It also provides a confirmation approach to solve the ambiguity problem, facilitating successful application of the model.
Article
The combination of the Content Addressable File Store (CAFS®; CAFS is a registered trademark of International Computers Limited) and an extension of relational analysis is described. This combination allows a simple and compact implementation of a database query and update language (FIDL). The language has one of the important properties of a “natural” language interface by using a “world model” derived from the relational analysis. The interpreter (FLIN) takes full advantage of the CAFS by employing a unique database storage technique which results in a fast response to both queries and updates.
Article
System/U is a universal relation database system under development at Standford University which uses the language C on UNIX. The system is intended to test the use of the universal view, in which the entire database is seen as one relation. This paper describes the theory behind System/U, in particular the theory of maximal objects and the connection between a set of attributes. We also describe the implementation of the DDL (Data Description Language) and the DML (Data Manipulation Language), and discuss in detail how the DDL finds maximal objects and how the DML determines the connection between the attributes that appear in a query.
Article
The universal relation model aims at achieving complete access-path independence in relational databases by relieving the user of the need for logical navigation among relations. We clarify the assumptions underlying it and explore the approaches suggested for implementing it. The essential idea of the universal relation model is that access paths are embedded in attribute names. Thus attribute names must play unique “roles.” Furthermore, it assumes that for every set of attributes there is a basic relationship that the user has in mind. The user's queries refer to these basic relationships rather than to the underlying database. Two fundamentally different approaches to the universal relation model have been taken. According to the first approach, the user's view of the database is a universal relation or many universal relations, about which the user poses queries. The second approach sees the model as having query-processing capabilities that relieve the user of the need to specify the logical access path. Thus, while the first approach gives a denotational semantics to query answering, the second approach gives it an operational semantics. We investigate the relationship between these two approaches.
Article
Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation). A prompting service which supplies such information is not a satisfactory solution. Activities of users at terminals and most application programs should remain unaffected when the internal representation of data is changed and even when some aspects of the external representation are changed. Changes in data representation will often be needed as a result of changes in query, update, and report traffic and natural growth in the types of stored information. Existing noninferential, formatted data systems provide users with tree-structured files or slightly more general network models of the data. In Section 1, inadequacies of these models are discussed. A model based on n -ary relations, a normal form for data base relations, and the concept of a universal data sublanguage are introduced. In Section 2, certain operations on relations (other than logical inference) are discussed and applied to the problems of redundancy and consistency in the user's model.
The Theory of Relutional Datu-buses
  • D Maier
D. Maier, The Theory of Relutional Datu-buses, Computer Science Press, Rockville, Md., 1983.