
Ryan Wisnesky- Harvard University
Ryan Wisnesky
- Harvard University
About
33
Publications
5,140
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
465
Citations
Introduction
Current institution
Publications
Publications (33)
We aim to accelerate the original vision of the semantic web by revisiting design decisions that have defined the semantic web up until now. We propose a shift in direction that more broadly embraces existing data infrastructure by reconsidering the semantic web's logical foundations. We argue to shift attention away from description logic, which h...
We survey the field of model management and describe a new model management approach based on algebraic specification.
We show how computation of left Kan extensions can be reduced to computation of free models of cartesian (finite-limit) theories. We discuss how the standard and parallel chase compute weakly free models of regular theories and free models of cartesian theories and compare the concept of “free model” with a similar concept from database theory know...
We show how computation of left Kan extensions can be reduced to computation of free models of cartesian (finite-limit) theories. We discuss how the standard and parallel chase compute weakly free models of regular theories and free models of cartesian theories, and compare the concept of "free model" with a similar concept from database theory kno...
In this paper, we use algebraic data types to define a formal basis for the property graph data models supported by popular open source and commercial graph databases. Developed as a kind of inter-lingua for enterprise data integration, algebraic property graphs encode the binary edges and key-value pairs typical of property graphs, and also provid...
Categorical Query Language is an open-source query and data integration scripting language that can be applied to common challenges in the field of computational science. We discuss how the structure-preserving nature of CQL data migrations protect those who publicly share data from the misinterpretation of their data. Likewise, this feature of CQL...
We survey the field of model management and describe a new model management approach based on algebraic specification.
In this paper, we develop an algebraic approach to data integration by combining techniques from functional programming, category theory, and database theory. In our formalism, database schemas and instances are algebraic (multi-sorted equational) theories of a certain form. Schemas denote categories, and instances denote their initial (term) algeb...
The goal of this paper is to illustrate the use of category theory (CT) as a basis for the integration of manufacturing service databases. In this paper, we use as our reference prior work by Kulvatunyou et al. (2013, "An Analysis of OWL-Based Semantic Mediation Approaches to Enhance Manufacturing Service Capability Models," Int. J. Comput. Integr....
Databases have been studied category-theoretically for decades. The database schema---whose purpose is to arrange high-level conceptual entities---is generally modeled as a category or sketch. The data itself, often called an instance, is generally modeled as a set-valued functor, assigning to each conceptual entity a set of examples. While mathema...
We describe an alternative solution to the impedance-mismatch problem between
programming and query languages: rather than embed queries in a programming
language, as done in LINQ systems, we embed programs in a query language, and
dub the result QINL.
In this paper we describe a simple equational formalism for expressing
functorial data migration. A graphical IDE and implementation of this formalism
are available at categoricaldata.net/fql.html.
In this paper we describe a functorial data migration scenario about the
manufacturing service capability of a distributed supply chain. The scenario is
a category-theoretic analog of an OWL ontology-based semantic enrichment
scenario developed at the National Institute of Standards and Technology
(NIST). The scenario is presented using, and is inc...
We introduce HIL, a high-level scripting language for entity resolution and integration. HIL aims at providing the core logic for complex data processing flows that aggregate facts from large collections of structured or unstructured data into clean, unified entities. Such flows typically include many stages of processing that start from the outcom...
Data integration remains a perenially difficult task. The need to access, integrate and make sense of large amounts of data has, in fact, accentuated in recent years. There are now many publicly available sources of data that can provide valuable information in various domains. Concrete examples of public data sources include: bibliographic reposit...
We study the data transformation capabilities associated with schemas that
are presented by labeled directed graphs and path equivalence constraints.
Unlike most approaches which treat graph-based schemas as abbreviations for
relational schemas, we treat graph-based schemas as categories. A morphism $M$
between schemas $S$ and $T$, which can be spe...
In this paper we demonstrate how to prove the correctness of systems implemented using low-level imperative features like pointers, files, and socket I/O with respect to high level I/O protocol descriptions by using the Coq proof assistant. We present a web-based course gradebook application developed with Ynot, a Coq library for verified imperativ...
We propose an intermediate form based on monad-algebra compre-hensions (to represent queries), folds (to represent computation), and setoids over polynomial datatypes (to represent data), suitable for use in collection processing. Such an intermediate form cap-tures, in a uniform way, large fragments of many recent large-scale collection processing...
The Monad Comprehension Calculus (MCC) is a highly expres-sive query language equal in expressive power to a subset of the Haskell programming language. This expressivity allows the MCC to subsume a variety of user-facing query languages, from nested relational algebra to OQL. The MCC possess a number of highly-desirable properties, including a nor...
This paper presents the results of a simulation study of a heterogeneous computational grid using different scheduling algorithms. After a definition of robustness based on the concept of work completion latency is discussed, a method to simulate grids based on Estimated Time to Compute matrices is presented. Three well-known scheduling algorithms...
We extend the Clio nested mapping language [1] with both recursive mappings and recursive data, showing how its standard set-theoretic semantics is incompatible with recursive data, how to interpret recursive schema as algebraic datatypes, and how recursive mappings can be interpreted operationally. 1. RECURSION Some schema languages used in data e...
In this paper we demonstrate that it is possible to implement certi ed web systems in a way not much di erent from writing Standard ML or Haskell code, including use of imperative features like pointers, les, and socket I/O. We present a web-based course gradebook application developed with Ynot, a Coq library for certi ed imperative programming.We...
We examine schema mappings from a type-theoretic perspective and aim to facilitate and formalize the reuse of mappings. Starting with the mapping language of Clio, we present a type-checking algorithm such that typable mappings are necessarily satisfiable. We add type variables to the schema language and present a theory of polymorphism, including...
We report on our experience implementing a lightweight, fully verified relational database management system (RDBMS). The functional specification of RDBMS behavior, RDBMS implementation, and proof that the implementation meets the specification are all written and verified in Coq. Our contributions include: (1) a complete specification of the rela...
We present a new approach for constructing and verifying higher-order, imperative programs using the Coq proof assistant. We build on the past work on the Ynot system, which is based on Hoare Type Theory. That original system was a proof of concept, where every program verification was accomplished via laborious manual proofs, with much code devote...
We present a new approach for constructing and verifying higher-order, imperative programs using the Coq proof assistant. We build on the past work on the Ynot system, which is based on Hoare Type Theory. That original system was a proof of concept, where every program verification was accomplished via laborious manual proofs, with much code devote...
Business objects represent the key concepts that a business needs to operate such as people, services, products, etc. but transforming these objects to and from existing data models can be difficult. Business objects have traditionally been represented in a backend data store using relational databases, and techniques for transformation must work w...
This paper describes Orchid, a system that converts declarative mapping specifications into data flow specifications (ETL jobs) and vice versa. Orchid provides an abstract operator model that serves as a common model for both transformation paradigms; both mappings and ETL jobs are transformed into instances of this common model. As an additional b...
As value networks evolve, we observe the phenomenon of businesses consolidating through mergers and businesses disaggregating and then virtually "re-merging" dynamically to respond to new opportunities. But these constitu- ent businesses were not built in any standard way, and neither were their IT sys- tems. An example in the industrial sector is...
Octopus [1] was proposed by Becker and Wille in 1998. The protocol aims to provide a method for establishing a shared key in ad hoc networks. It is a contributory key establishment scheme, in which all the participating nodes contribute some material for the final generated key. The aim behind constructing this protocol is to achieve a lower bound...
nique, and demonstrates how a particular technology - SUBDUE - can be used to eciently implement this technique. It is our hope that by capturing various uses of webgraph com- pression within a single framework, someone who wants to capture and compress specific information from the web need do no more than state what he wants to capture and push a...
Business objects are data models that capture the semantics of (not necessarily relational) business concepts. They are a powerful way to represent rich data inside enterprises. However, because of their intrinsic complexity, transforming these objects into other data models can be difficult. Since business objects are generally implemented in a ba...