Article

A relational model of data for large shared data banks (Reprinted from Communications of the ACM, June, pg 377-87, 1970)

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Future users of large data banks must be protected from having to know how the data is [sic] organized in the machine (the internal representation). A prompting service which supplies such information is not a satisfactory solution. Activities of users at terminals and most application programs should remain unaffected when the interned representation of data is changed and even when some aspects of the external representation are changed. Changes in data representation will often be needed as a result of changes in query, update, and report traffic and natural growth in the types of stored information. Existing noninferential, formatted data systems provide users with tree-structured files or slightly more general network models of the data. Inadequacies of these models are discussed. A model based on n-ary relations, a normal form for data base relations, and the concept of a universal data sublanguage are introduced.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

Article
Full-text available
Sleep is an essential part of human life, and the quality of one’s sleep is also an important indicator of one’s health. Analyzing the Electroencephalogram (EEG) signals of a person during sleep makes it possible to understand the sleep status and give relevant rest or medical advice. In this paper, a decent amount of artificial data generated with a data augmentation method based on Discrete Cosine Transform from a small amount of real experimental data of a specific individual is introduced. A classification model with an accuracy of 92.85% has been obtained. By mixing the data augmentation with the public database and training with the EEGNet, we obtained a classification model with significantly higher accuracy for the specific individual. The experiments have demonstrated that we can circumvent the subject-independent problem in sleep EEG in this way and use only a small amount of labeled data to customize a dedicated classification model with high accuracy.
Article
Bioenergy is a crucial element of the future energy system with wide range of applications in electricity, heat and transport. A major challenge for the analysis and optimisation of the bioenergy system is the degree of diversity and complexity compared to wind or solar energy. A coherent database for studying the role of bioenergy in the energy system needs to cover the different entities such as bio-resources, conversion procedures and process chains. Since there is no comprehensive data collection for bioenergy so far, we develop a SQLite database by merging several existing datasets and additional information. The resulting Bio-Energy Technology Database (BET.db) provides a consistent set of 141 feedstocks as well as energy carriers, 259 conversion technologies, and 134 energy supply concepts. The proof of concept within a bioenergy system modelling a wide range of technologies for the electricity, heat and transport sectors using the BENOPT model has been successful. By providing a one-stop-shop solution for techno-economic information about on the bioenergy nexus, this blind spot can be avoided for further investigations. The current stage of development is an intermediate prototype that will be developed into a more versatile and interactive web application later on.
Article
Full-text available
Background All aspects of our society, including the life sciences, need a mechanism for people working within them to represent the concepts they employ to carry out their research. For the information systems being designed and developed to support researchers and scientists in conducting their work, conceptual models of the relevant domains are usually designed as both blueprints for a system being developed and as a means of communication between the designer and developer. Most conceptual modelling concepts are generic in the sense that they are applied with the same understanding across many applications. Problems in the life sciences, however, are especially complex and important, because they deal with humans, their well-being, and their interactions with the environment as well as other organisms. Results This work proposes a “systemist” perspective for creating a conceptual model of a life scientist’s problem. We introduce the notion of a system and then show how it can be applied to the development of an information system for handling genomic-related information. We extend our discussion to show how the proposed systemist perspective can support the modelling of precision medicine. Conclusion This research recognizes challenges in life sciences research of how to model problems to better represent the connections between physical and digital worlds. We propose a new notation that explicitly incorporates systemist thinking, as well as the components of systems based on recent ontological foundations. The new notation captures important semantics in the domain of life sciences. It may be used to facilitate understanding, communication and problem-solving more broadly. We also provide a precise, sound, ontologically supported characterization of the term “system,” as a basic construct for conceptual modelling in life sciences.
Article
Data integration is an essential element of today’s ecosystem. In this article, we take the reader on a trip from the days when data integration was a human-intensive task, through the era of algorithmic-supported data integration and show a vision of what the future may bring.
Article
The continued development of cloud computing requires technologies protecting users' data privacy even from the cloud providers themselves, such as Multi-user searchable encryption. It allows data owners to selectively enable users to perform keyword searches over their encrypted data stored at a cloud server. For privacy purposes, it is important to limit what an adversarial server can infer about the encrypted data, even if it colludes with some users. Clearly, in this case it can learn the content of data shared with these “corrupted” users, however, it is important to ensure this collusion does not reveal information about parts of the dataset that are only shared with “uncorrupted” users via cross-user leakage. In this work, we propose three novel multi-user searchable encryption schemes eliminating cross-user leakage. Compared to previous ones, our first two schemes are the first to achieve asymptotically optimal search time . Our third scheme achieves minimal user storage and forward privacy with respect to data sharing, but slightly slower search performance. We formally prove the security of our schemes under reasonable assumptions. Moreover, we implement them for textual documents and tabular databases and evaluate their computation and communication performance with encouraging results.
Article
Full-text available
Fault-tolerant systems are an important discussion subject in our world of interconnected devices. One of the major failure points of every distributed infrastructure is the database. A data migration or an overload of one of the servers could lead to a cascade of failures and service downtime for the users. NoSQL databases sacrifice some of the consistency provided by traditional SQL databases while privileging availability and partition tolerance. This paper presents the design and implementation of a distributed in-memory database that is based on the actor model. The benefits of the actor model and development using functional languages are detailed, and suitable performance metrics are presented. A case study is also performed, showcasing the system’s capacity to quickly recover from the loss of one of its machines and maintain functionality.
Chapter
Data-driven analysis plays a vital role in research projects, and sharing data with collaborators inside or outside a project is supposed to be daily scientific work. There are various tools for research data management, which offer features like storing data, meta-data indexing, and provide options to share data. However, currently, none of them offers capabilities for sharing data in different levels of detail without excessive data duplication. Naturally, sharing data by duplication is a tedious process, as preparing data for sharing typically involves changing temporal resolution (i.e., aggregation) or anonymization, e.g., to ensure privacy. In this paper, instead of re-inventing the wheel, we ask whether the concept of views, a well-established concept in relational databases, fulfills the above requirement. Conducting a case study for a project employing sharing of learning analytics data, we propose a framework that allows for fine-granular configuration of shared content based on the concept of views. In the case study, we a) analyze a data reuse scenario based on the FAIR principles, b) suggest a concept for using views for data sharing, and c) demonstrate its feasibility with a proof-of-concept.KeywordsResearch data managementData sharingProvenance
Article
In recent years, the utilization of health care databases has been increasing worldwide. It is expected that Real World Data (RWD) will soon be effectively used for clinical research in Japan. On the other hand, database studies that use accumulated existing data such as electronic medical records, Diagnosis Procedure Combinations (DPCs), and health insurance claims, require extremely high loads of data preprocessing before statistical analysis is possible. So far, there is insufficient literature that describes the challenges of RWD preprocessing from an academic point of view. In this review paper, the challenges of database study are classified into three categories:(1)data content,(2)data structure, and(3)large-volume data handling. We then investigated existing preprocessing research and systematically introduced them. Most data preprocessing research targeted the improvement and reliability of the database itself through supplementing data contents required for each clinical research. There is very little research with the primary purpose of solving problems related to data structures and large-volume data processing. As the use of RWD for clinical research increases, the importance of the data preprocessing field will be recognized. In the future, we expect to see more research focused on RWD, which can enable the growth of clinical researches using RWD.
Article
We establish the undecidability of conditional affine information inequalities, the undecidability of the conditional independence implication problem with a constraint that one random variable is binary, and the undecidability of the problem of deciding whether the intersection of the entropic region and a given affine subspace is empty. This is a step towards the conjecture on the undecidability of conditional independence implication. The undecidability is proved via a reduction from the periodic tiling problem (a variant of the domino problem). Hence, one can construct examples of the aforementioned problems that are independent of ZFC (assuming ZFC is consistent).
Article
In this sequel to our previous article (Cordero et al. , 2020) on general inference systems for reasoning with if–then dependencies, we study transformations of if–then rules to semantically equivalent collections of if–then rules suitable to solve several problems related to reasoning with data dependencies. We work in a framework of general lattice-based if–then rules whose semantics is parameterized by systems of isotone Galois connections. This framework allows us to obtain theoretical insight as well as algorithms on a general level and observe their special cases by choosing various types of parameterizations. This way, we study methods for automated reasoning with different types of if–then rules in a single framework that covers existing as well as novel types of rules. Our approach supports a large family of if–then rules, including fuzzy if–then rules with various types of semantics. The main results in this article include new observations on the syntactic inference of if–then rules, complete collections of rules, reduced normal forms of collections of rules, and automated reasoning methods. We demonstrate the generality of the framework and the results by examples of their particular cases focusing on fuzzy if–then rules.
Chapter
At the core of clinical informatics is information technology (IT). IT is how the innovative design behind clinical informatics concepts ultimately are presented externally in the real world. For that reason, one must become intimately familiar with what is possible using IT while navigating its boundaries. This chapter will take the reader through ways to represent data, write programs, and conceptualize networks. With some practice, clinical informaticists will understand the trade-offs in various approaches and make the decisions that will produce a reliable IT system for their organization.KeywordsStructured/Unstructured dataETLRelational databaseData warehousesDatabase schemaData interoperabilityEntity-relationship diagramHealth information exchangeNetwork topologyApplication programming interfaceWaterfall methodAgile methodProgramming languagesStructured query languageHIPAA
Article
Full-text available
High-quality data is key to interpretable and trustworthy data analytics and the basis for meaningful data-driven decisions. In practical scenarios, data quality is typically associated with data preprocessing, profiling, and cleansing for subsequent tasks like data integration or data analytics. However, from a scientific perspective, a lot of research has been published about the measurement (i.e., the detection) of data quality issues and different generally applicable data quality dimensions and metrics have been discussed. In this work, we close the gap between data quality research and practical implementations with a detailed investigation on how data quality measurement and monitoring concepts are implemented in state-of-the-art tools. For the first time and in contrast to all existing data quality tool surveys, we conducted a systematic search, in which we identified 667 software tools dedicated to “data quality.” To evaluate the tools, we compiled a requirements catalog with three functionality areas: (1) data profiling, (2) data quality measurement in terms of metrics, and (3) automated data quality monitoring. Using a set of predefined exclusion criteria, we selected 13 tools (8 commercial and 5 open-source tools) that provide the investigated features and are not limited to a specific domain for detailed investigation. On the one hand, this survey allows a critical discussion of concepts that are widely accepted in research, but hardly implemented in any tool observed, for example, generally applicable data quality metrics. On the other hand, it reveals potential for functional enhancement of data quality tools and supports practitioners in the selection of appropriate tools for a given use case.
Article
Full-text available
Epistemetrics, a discipline concerning measuring knowledge qualitative and quantitatively, is an under developed area of research. A primary reason for the current status is a perceived lack of a formal structure of knowledge. In particular, a number of fundamental questions about knowledge remain to be answered: What is the basic unit of knowledge? How many types of knowledge are there? How do they relate to each other? And what are the hierarchies of knowledge? To address these questions, we proposed a knowledge categorization system, the EApc framework. The present work collects further evidence, from theories and practices of several diverse research disciplines, to argue for, to refine and to expand the EApc framework. This framework can provide new perspectives in understanding scientific processes and in assessing values and impacts of research proposals and products. It may also shed light on the mechanisms of human cognition and information processing.
Article
Full-text available
The data-oriented paradigm has proven to be fundamental for the technological transformation process that characterizes Industry 4.0 (I4.0) so that big data and analytics is considered a technological pillar of this process. The goal of I4.0 is the implementation of the so-called Smart Factory, characterized by Intelligent Manufacturing Systems (IMS) that overcome traditional manufacturing systems in terms of efficiency, flexibility, level of integration, digitalization, and intelligence. The literature reports a series of system architecture proposals for IMS, which are primarily data driven. Many of these proposals treat data storage solutions as mere entities that support the architecture’s functionalities. However, choosing which logical data model to use can significantly affect the performance of the IMS. This work identifies the advantages and disadvantages of relational (SQL) and non-relational (NoSQL) data models for I4.0, considering the nature of the data in this process. The characterization of data in the context of I4.0 is based on the five dimensions of big data and a standardized format for representing information of assets in the virtual world, the Asset Administration Shell. This work allows identifying appropriate transactional properties and logical data models according to the volume, variety, velocity, veracity, and value of the data. In this way, it is possible to describe the suitability of relational and NoSQL databases for different scenarios within I4.0.
Article
Full-text available
Big data is an expression for massive data sets consisting of both structured and unstructured data that are particularly difficult to store, analyze and visualize. Big data analytics has the potential to help companies or organizations improve operations as well as disclose hidden patterns and secret correlations to make faster and intelligent decisions. This article provides useful information on this emerging and promising field for companies, industries, and researchers to gain a richer and deeper insight into advancements. Initially, an overview of big data content, key characteristics, and related topics are presented. The paper also highlights a systematic review of available big data techniques and analytics. The available big data analytics tools and platforms are categorized. Besides, this article discusses recent applications of big data in chemical industries to increase understanding and encourage its implementation in their engineering processes as much as possible. Finally, by emphasizing the adoption of big data analytics in various areas of process engineering, the aim is to provide a practical vision of big data.
Article
Full-text available
CamCOPS is a free, open-source client–server system for secure data capture in the domain of psychiatry, psychology, and the clinical neurosciences. The client is a cross-platform C++ application, suitable for mobile and offline (disconnected) use. It allows touchscreen data entry by subjects/patients, researchers/clinicians, or both together. It implements a large and extensible range of tasks, from simple questionnaires to complex animated tasks. The client uses encrypted data storage and sends data via an encrypted network connection to a CamCOPS server. Individual institutional users set up and run their own CamCOPS server, so no data is transferred outside the hosting institution's control. The server, written in Python, provides clinically oriented and research-oriented views of tasks, including the tracking of changes over time. It provides an audit trail, export facilities (such as to an institution's primary electronic health record system), and full structured data access subject to authorization. A single CamCOPS server can support multiple research/clinical groups, each having its own identity policy (e.g., fully identifiable for clinical use; de-identified/pseudonymised for research use). Intellectual property rules regarding third-party tasks vary and CamCOPS has several mechanisms to support compliance, including for tasks that may be permitted to some institutions but not others. CamCOPS supports task scheduling and home testing via a simplified user interface. We describe the software, report local information governance approvals within part of the UK National Health Service, and describe illustrative clinical and research uses.
Article
Full-text available
We propose a very general, unifying framework for the concepts of dependence and independence. For this purpose, we introduce the notion of diversity rank . By means of this diversity rank we identify total determination with the inability to create more diversity, and independence with the presence of maximum diversity. We show that our theory of dependence and independence covers a variety of dependence concepts, for example the seemingly unrelated concepts of linear dependence in algebra and dependence of variables in logic.
Article
Full-text available
A previous research has identified large data and information sources which exist about netball performance and align with the discussion of coaches during the games. Normative data provides context to measures across many disciplines, such as fitness testing, physical conditioning, and body composition. These data are normally presented in the tables as representations of the population categorized for benchmarking. Normative data does not exist for benchmarking or contextualization in netball, yet the coaches and players use performance statistics. A systems design methodology was adopted for this study where a process for automating the organization, normalization, and contextualization of netball performance data was developed. To maintain good ecological validity, a case study utilized expert coach feedback on the understandability and usability of the visual representations of netball performance population data. This paper provides coaches with benchmarks for assessing the performances of players, across competition levels against the player positions for performance indicators. It also provides insights to a performance analyst around how to present these benchmarks in an automated “real-time” reporting tool.
Article
Full-text available
This paper presents a model for Enterprise Application Integration (EAI) in the modern era of data explosion and globalisation. Application here refers to software, which is in essence data system, and data refers to both information and knowledge (data serves as a vehicle for information as well as knowledge). The salient features of the model are: (1) separation of business functions from applications and enterprises, (2) three-layer architecture of the model (conceptual or semantic level, external or application level, internal or realisation level), and (3) integration of structured, semi-structured and non-structured data. To our best knowledge, the existing model or solution to EAI does not hold all the three features. A case study is presented to illustrate how the model works. The model can be used by an individual enterprise or a group of enterprises that form a network, e.g., a holistic supply chain network.
Article
Full-text available
The new field of synthetic biology aims at the creation of artificially designed organisms. A major breakthrough in the field was the generation of the artificial synthetic organism Mycoplasma mycoides JCVI‐syn3A. This bacterium possesses only 452 protein‐coding genes, the smallest number for any organism that is viable independent of a host cell. However, about one third of the proteins have no known function indicating major gaps in our understanding of simple living cells. To facilitate the investigation of the components of this minimal bacterium, we have generated the database SynWiki (http://synwiki.uni-goettingen.de/). SynWiki is based on a relational database and gives access to published information about the genes and proteins of M. mycoides JCVI‐syn3A. To gain a better understanding of the functions of the genes and proteins of the artificial bacteria, protein–protein interactions that may provide clues for the protein functions are included in an interactive manner. SynWiki is an important tool for the synthetic biology community that will support the comprehensive understanding of a minimal cell as well as the functional annotation of so far uncharacterized proteins.
Article
Full-text available
Much research has been conducted in the area of machine learning algorithms; however, the question of a general description of an artificial learner’s (empirical) performance has mainly remained unanswered. A general, restrictions-free theory on its performance has not been developed yet. In this study, we investigate which function most appropriately describes learning curves produced by several machine learning algorithms, and how well these curves can predict the future performance of an algorithm. Decision trees, neural networks, Naïve Bayes, and Support Vector Machines were applied to 130 datasets from publicly available repositories. Three different functions (power, logarithmic, and exponential) were fit to the measured outputs. Using rigorous statistical methods and two measures for the goodness-of-fit, the power law model proved to be the most appropriate model for describing the learning curve produced by the algorithms in terms of goodness-of-fit and prediction capabilities. The presented study, first of its kind in scale and rigour, provides results (and methods) that can be used to assess the performance of novel or existing artificial learners and forecast their `capacity to learn’ based on the amount of available or desired data.
Article
This project offers a comparative and polycentric approach to the connection of trade nodes in the Asian, American, African, and European markets in the early modern period. These analyses reevaluate the great divergence debate by presenting new case studies at the local scale and observing the impact of global goods and changes in consumer behaviour connecting local markets of the Pacific and Atlantic area. In this manner, we explore the circulation and consumption of Chinese goods in the Americas and in Europe, as well as in the African slave market through the Royal Company of the Philippines. Conversely, we also analyse the impact of the introduction of Western goods (of American and European origin) into China.
Article
Full-text available
Anomalies are occurrences in a dataset that are in some way unusual and do not fit the general patterns. The concept of the anomaly is typically ill defined and perceived as vague and domain-dependent. Moreover, despite some 250 years of publications on the topic, no comprehensive and concrete overviews of the different types of anomalies have hitherto been published. By means of an extensive literature review this study therefore offers the first theoretically principled and domain-independent typology of data anomalies and presents a full overview of anomaly types and subtypes. To concretely define the concept of the anomaly and its different manifestations, the typology employs five dimensions: data type, cardinality of relationship, anomaly level, data structure, and data distribution. These fundamental and data-centric dimensions naturally yield 3 broad groups, 9 basic types, and 63 subtypes of anomalies. The typology facilitates the evaluation of the functional capabilities of anomaly detection algorithms, contributes to explainable data science, and provides insights into relevant topics such as local versus global anomalies.
Chapter
The paper is related to modeling and metamodeling disciplines, which are applicable in the software engineering domain. It is focused on the subject of finding the way leading to the selection of the right metamodel for a particular modeling problem. The approach introduced in the paper is based on a specific application of the Extended Graph Generalization, which is used to identify features of known metamodels in relation to the extensions and generalizations introduced by the Extended Graph Generalization definition. The discussion is related to an illustrative case-study. The paper introduces the Extended Graph Generalization definitions in Association-Oriented Metamodel, the Extended Graph Generalization symbolic notation, which are used when comparing features of different metamodels in relation to the Extended Graph Generalization features.
Chapter
The increasing amount of digital data and the declining cost of data storage have led to the fact that companies began collecting any the data possible, regardless of its adequacy and usability. This results in increasingly diverse data, in terms of its structure, quality, availability and the source of origin. Dark data is one type of data that increases significantly as the volume of data expands. Scientific literature does not precisely define the term “dark data”, while its interpretation among scientists is ambiguous. The aim of this article entails an attempt to define the dark data occurring in an enterprise, by identification of its essential features. The article presents an overview of the definitions of the term dark data, a proposal of its interpretation, and a classification of data in a company with regard to: usability, availability and quality. The analysis of the concept of dark data was carried out via a review of international journals and articles published on the Internet by Data Science practitioners. As part of the research, four universal features of dark datasets have been indicated (unavailability, unawareness, uselessness, and costliness). Based on data availability and its quality, four groups of enterprise data have also been distinguished. The data classification developed in this way allowed systematization of the term “dark data”.
Chapter
This paper formalizes the graphical modularization technique view traversal for an ontology component of a Domain Information System (DIS). Our work is motivated by developing the ability to dynamically extract a module (called a view traversal module) based on an initial set of concepts. We show how the ability to quantify the knowledge that is preserved (or lost) in a view traversal module is significant for a multi-agent setting, which is a setting that requires provable privacy. To ensure partial knowledge preservation, we extend the view traversal module to a principal ideal subalgebra module. The cost of this extension is that the obtained knowledge is coarser, as the atoms of the associated lattice are composite yet considered atomic. The presented work constitutes a foundational step towards theories related to reasoning on partial domain knowledge.
Article
In this paper, we present a general inference system for reasoning with if-then rules. They are defined using general lattice-theoretic notions and their semantics is defined using particular closure operators parameterized by systems of isotone Galois connections. In this general setting, we introduce a simplification logic, show its sound and complete axiomatization, and deal with related issues. The presented results can be seen as forming parameterized framework for dealing with if-then rules that allows to focus on particular dependencies obtained by choices of parameterizations.
Chapter
Management of graph structured data has important applications in several areas. Queries on such data sets are based on structural properties of the graphs, in addition to values of attributes. Answering such queries pose significant challenges, as reasoning about structural properties across graphs are typically intractable problems. This chapter provides an overview of the challenges in designing databases over graph datasets. Different application areas that use graph databases, pose their own unique set of challenges, making the task of designing a generic graph-oriented DBMS still an elusive goal. The purpose of this chapter is to provide a tutorial introduction to some of the major challenges of graph data management, survey some of the piecemeal solutions that have been proposed, and suggest an overall structure in which these different solutions can be meaningfully placed.
Article
Full-text available
Background: There is significant global policy interest related to enabling a data-driven approach for evidence-based primary care system development. This paper describes the development and initial testing of a prototype tool (the Problem-Oriented Primary Care System Development Record, or PCSDR) that enables a data-driven and contextualized approach to primary care system development. Methods: The PCSDR is an electronic record that enables the systematic input, classification, structuring, storage, processing and analysis of different types of data related to the structure, function and performance of primary care systems over time. Data inputted into the PCSDR was coded using the WHO's PHC-IMPACT framework and classification system. The PCSDR's functionalities were tested by using a case study of primary care system development in Tajikistan. Results: Tajikistan's case study demonstrated that the PCSDR is a potentially effective and conceptually-sound tool for the input, classification, structuring and storage of different data types from myriad sources. The PCSDR is therefore a basic data entry and data management system that enables query and analytics functions for health services research and evidence-based primary care system development functions. Conclusions: The PCSDR is a data system that enables a contextualized approach to evidence-based primary care system development. It represents a coherent and effective synthesis of the fields of primary care system development and performance assessment. The PCSDR enables analysts to leverage primary care performance assessment frameworks for a broad range of functions related to health systems analysis, improvement and the development of learning health systems.
Article
A major determinant of the quality of software systems is the quality of their requirements, which should be both understandable and precise. Most requirements are written in natural language, good for understandability but lacking in precision. To make requirements precise, researchers have for years advocated the use of mathematics-based notations and methods, known as “formal”. Many exist, differing in their style, scope and applicability. The present survey discusses some of the main formal approaches and compares them to informal methods. The analysis uses a set of 9 complementary criteria, such as level of abstraction, tool availability, traceability support. It classifies the approaches into five categories: general-purpose, natural-language, graph/automata, other mathematical notations, seamless (programming-language-based). It presents approaches in all of these categories, altogether 22 different ones, including for example SysML, Relax, Eiffel, Event-B, Alloy. The review discusses a number of open questions, including seamlessness, the role of tools and education, and how to make industrial applications benefit more from the contributions of formal approaches. This is the full version of the survey, including some sections and two appendices which, because of length restrictions, do not appear in the submitted version. This article can be downloaded here: https://arxiv.org/pdf/1911.02564.pdf.
Article
Full-text available
Graph theory is a well-established theory with many methods used in mathematics to study graph structures. In the field of medicine, electronic health records (EHR) are commonly used to store and analyze patient data. Consequently, it seems straight-forward to perform research on modeling EHR data as graphs. This systematic literature review aims to investigate the frontiers of the current research in the field of graphs representing and processing patient data. We want to show, which areas of research in this context need further investigation. The databases MEDLINE, Web of Science, IEEE Xplore and ACM digital library were queried by using the search terms health record, graph and related terms. Based on the “Preferred Reporting Items for Systematic Reviews and Meta-Analysis” (PRISMA) statement guidelines the articles were screened and evaluated using full-text analysis. Eleven out of 383 articles found in systematic literature review were finally included for analysis in this literature review. Most of them use graphs to represent temporal relations, often representing the connection among laboratory data points. Only two papers report that the graph data were further processed by comparing the patient graphs using similarity measurements. Graphs representing individual patients are hardly used in research context, only eleven papers considered such kind of graphs in their investigations. The potential of graph theoretical algorithms, which are already well established, could help increasing this research field, but currently there are too few papers to estimate how this area of research will develop. Altogether, the use of such patient graphs could be a promising technique to develop decision support systems for diagnosis, medication or therapy of patients using similarity measurements or different kinds of analysis.
Article
Article
This paper presents a RAND project concerned with the use of computers as assistants in the logical analysis of large collections of factual data. A system called the Relational Data File was developed for this purpose. The Relational Data File is briefly detailed and problems arising from its implementation are discussed.
Article
An important consideration in the design of programming systems for the management of large files of data is the method of treating hierarchical data (that is, data among which logical relationships exist at more than two levels). Recent systems have accomplished this by simply duplicating some essential item of data at several levels. Such duplication makes the storage of even small data bases inefficient; for large masses of data, storage becomes economically unfeasible. Other systems have provided the means to specify and construct hierarchies, but none have provided language that affords control over retrieval and output levels, and control over the scope of output. Within the Time-Shared Data Management System (TDMS), currently being produced at System Development Corporation, a method has been devised for maintaining hierarchical associations within logical entries of a data base. Basically, the technique permits the automatic association of related data through a device known as a repeating group. The term “repeating group” is not new, but the TDMS treatment of the repeating group concept is. This paper describes how this technique is implemented in the language and tables of TDMS.
Article
Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation). A prompting service which supplies such information is not a satisfactory solution. Activities of users at terminals and most application programs should remain unaffected when the internal representation of data is changed and even when some aspects of the external representation are changed. Changes in data representation will often be needed as a result of changes in query, update, and report traffic and natural growth in the types of stored information. Existing noninferential, formatted data systems provide users with tree-structured files or slightly more general network models of the data. In Section 1, inadequacies of these models are discussed. A model based on n -ary relations, a normal form for data base relations, and the concept of a universal data sublanguage are introduced. In Section 2, certain operations on relations (other than logical inference) are discussed and applied to the problems of redundancy and consistency in the user's model.
An Introduction to Mathematical Logic I. Princeton U. Press Princeton N.J. 1956. CHURCH A. An Introduction to Mathematical Logic I
  • Church A