Philip A. Bernstein

Philip A. Bernstein
Microsoft · Microsoft Research

About

302
Publications
47,470
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
26,525
Citations

Publications

Publications (302)
Article
Every five years, a group of the leading database researchers meet to reflect on their community's impact on the computing industry as well as examine current research challenges.
Article
Approximately every five years, a group of database researchers meet to do a self-assessment of our community, including reflections on our impact on the industry as well as challenges facing our research community. This report summarizes the discussion and conclusions of the 9th such meeting, held during October 9-10, 2018 in Seattle.
Conference Paper
Full-text available
Many of today’s interactive server applications are implemented using actor-oriented programming frameworks. Such applications treat actors as a distributed in-memory object-oriented database. However, actor programming frameworks offer few if any data-base system features, leaving application developers to fend for themselves. It is challenging to...
Article
In large enterprises, data discovery is a common problem faced by users who need to find relevant information in re lational databases. In this scenario, schema annotation is a useful tool to enrich a database schema with descriptive keywords. In this paper, we demonstrate Barcelos, a sys tem that automatically annotates corporate databases. Un lik...
Conference Paper
Scaling-out a database system typically requires partitioning the database across multiple servers. If applications do not partition perfectly, then transactions accessing multiple partitions end up being distributed, which has well-known scalability challenges. To address them, we describe a high-performance transaction mechanism that uses optimis...
Patent
A sequence of storage devices of a data store may include one or more stripesets for storing data stripes of different lengths and of different types. Each data stripe may be stored in a prefix or other portion of a stripeset. Each data stripe may be identified by an array of addresses that identify each page of the data stripe on each included sto...
Conference Paper
We study the following problem: given the name of an ad-hoc concept as well as a few seed entities belonging to the concept, output all entities belonging to it. Since producing the exact set of entities is hard, we focus on returning a ranked list of entities. Previous approaches either use seed entities as the only input, or inherently require ne...
Patent
Full-text available
Aspects of the subject matter described herein relate to automating evolution of schemas and mappings. In aspects, mappings between a conceptual model and a store model are updated automatically in response to a change that occurs to the conceptual model. For example, when a change occurs to the conceptual model, a local scope of the change is dete...
Patent
Full-text available
Architecture that includes an ordered and shared log of indexed transaction records represented as multi-version data structures of nodes and node pointers. The log is a sole monolithic source of datastore state and is used for enforcing concurrency control. The architecture also includes a transaction processing component that appends transaction...
Article
Full-text available
Abstract Every few years a group of database researchers meets to discuss the state of database research, its impact on practice, and important new directions. This report summarizes the discussion and conclusions of the eighth such meeting, held October 14-15, 2013 in Irvine, California. It observes that Big Data has now become a de ning challenge...
Patent
A method and system for increasing server cluster availability by requiring at a minimum only one node and a quorum replica set of replica members to form and operate a cluster. Replica members maintain cluster operational data. A cluster operates when one node possesses a majority of replica members, which ensures that any new or surviving cluster...
Patent
Computers are provided with a totally ordered, durable shared log. Shared storage is used and can be directly accessed by the computers over a network. Append-log operations are made atomic in the face of failures by committing provisional append ordering information onto a log. The log may comprise multiple flash packages or non-volatile memory de...
Conference Paper
In an object-to-relational mapping system (ORM), mapping expressions explain how to expose relational data as objects and how to store objects in tables. If mappings are sufficiently expressive, then it is possible to define lossy mappings. If a user updates an object, stores it in the database based on a lossy mapping, and then retrieves the objec...
Conference Paper
We describe a software architecture we have developed for a constructive containment checker of Entity SQL queries defined over extended ER schemas expressed in Microsoft's Entity Data Model. Our application of interest is compilation of object-to-relational mappings for Microsoft's ADO.NET Entity Framework, which has been shipping since 2007. The...
Conference Paper
There has been a resurgence of work on replicated, distributed database systems to meet the demands of intermittently-connected clients and of disaster-tolerant databases that span data centers. Many systems weaken the criteria for replica-consistency or isolation, and in some cases add new mechanisms, to improve partition-tolerance, availability,...
Patent
Full-text available
Architecture that addresses the efficient detection of conflicts and the merging of data structures such as trees, when possible. The process of detecting conflicts and merging the trees is a meld operation. Confluent trees offer transactional consistency with some degree of isolation, and scaling out a concurrent system based on confluent trees ca...
Patent
Full-text available
A shared storage system is described herein that is based on an append-only model of updating a storage device to allow multiple computers to access storage with lighter-weight synchronization than traditional systems and to reduce wear on flash-based storage devices. Appending data allows multiple computers to write to the same storage device with...
Article
This paper argues that an algebraic approach to regular languages, such as using monoids, can yield efficient algorithms on strings and trees.
Article
Full-text available
XML is commonly supported by SQL database systems. However, existing mappings of XML to tables can only deliver satisfactory query performance for limited use cases. In this paper, we propose a novel mapping of XML data into one wide table whose columns are sparsely populated. This mapping provides good performance for document types and queries th...
Article
This paper describes a new optimistic concurrency control algorithm for tree-structured data called meld. Each transaction executes on a snapshot of a multiversion database and logs a record with its intended updates. Meld processes log records in log order on a cached partial-copy of the last committed state to determine whether each transaction c...
Article
Full-text available
In a paper published in the 2001 VLDB Conference, we proposed treating generic schema matching as an independent problem. We developed a taxonomy of existing techniques, a new schema matching algorithm, and an approach to comparative evaluation. Since then, the field has grown into a major research topic. We briefly summarize the new techniques tha...
Conference Paper
Cloud SQL Server is a relational database system designed to scale-out to cloud computing workloads. It uses Microsoft SQL Server as its core. To scale out, it uses a partitioned database on a shared-nothing system architecture. Transactions are constrained to execute on one partition, to avoid the need for two-phase commit. The database is replica...
Article
This article is a summary of the technology issues and challenges of data-intensive science and cloud computing as discussed in the Data-Intensive Science (DIS) workshop in Seattle, September 19-20, 2010.
Conference Paper
Full-text available
Hyder supports reads and writes on indexed records within classical multi-step transactions. It is designed to run on a cluster of servers that have shared access to a large pool of network-addressable raw flash chips. The flash chips store the indexed records as a multiversion log-structured database. Log-structuring leverages the high random I/O...
Conference Paper
Schema evolution is an unavoidable consequence of the application development lifecycle. The two primary schemas in an application, the conceptual model and the persistent database model, must co-evolve or risk quality, stability, and maintainability issues. We study application-driven scenarios, where the conceptual model changes and the database...
Conference Paper
This paper presents an object-oriented representation of the core structural and constraint-related features of XML Schema. The structural features are represented within the limitations of object-oriented type systems including particles (elements and groups) and type hierarchies (simple and complex types and type derivations). The applicability o...
Conference Paper
Modern storage solutions, such as non-volatile solid-state devices, offer unprecedented speed of access over high-bandwidth interconnects. An array of flash memory chips attached directly to a 1-10 GB fiber switch can support up to 100K page writes per second. While no single host can drive such throughput, the combined power of a large group of cl...
Conference Paper
Full-text available
This paper presents algorithms that make it possible to process XML data that conforms to XML Schema (XSD) in a mainstream object-oriented programming language. These algorithms are based on our object-oriented view of the core of XSD. The novelty of this view is that it is intellectually manageable for object-oriented programmers while still captu...
Conference Paper
Full-text available
Schema evolution is an unavoidable consequence of the application development lifecycle. The two primary schemas in an application, the client conceptual object model and the persistent database model, must co-evolve or risk quality, stability, and maintainability issues. We present MoDEF, an extension to Visual Studio that supports automatic evolu...
Conference Paper
Object-relational mapping systems have become often-used tools to provide application access to relational databases. In a database-first development scenario, the onus is on the developer to construct a meaningful object layer for the application because shipping tools, as ORM tools only ship database reverse-engineering tools that generate object...
Article
Full-text available
Solid-state disks are currently based on NAND flash and expose a standard disk interface. To accommodate limitations of the medium, solid-state disk implementations avoid rewriting data in place, instead exposing a logical remapping of the physical storage. We present an alternative way to use flash storage, where an append interface is exposed dir...
Chapter
Publisher Summary Replication is the technique of using multiple copies of a server or a resource for better availability and performance. Each copy is called a replica.The main goal of replication is to improve availability, since a service is available even if some of its replicas are not. This helps mission critical services, such as many financ...
Chapter
Transaction processing (TP) systems often are expected to be available 24 hours per day, 7 days per week, to support around-the-clock business operations. Two factors affect their availability: the mean time between failures (MTBF) and the mean time to repair (MTTR). Improving availability requires increasing MTBF, decreasing MTTR, or both. Compute...
Chapter
Although transaction processing principles have remained fairly constant during the past 20 years or so, the technologies that implement the principles have been evolving. Recent changes starting to impact transactional middleware products include cloud computing, highly scalable computing designs, solid state memory, and streaming event processing...
Chapter
The two-phase commit protocol ensures that a transaction either commits at all the resource managers that it accessed or aborts at all of them. It avoids the undesirable outcome that the transaction commits at one resource manager and aborts at another. The protocol is driven by a coordinator, which communicates with participants, which together in...
Chapter
An important property of transactions is that they are isolated, which means that the execution of transactions has the same effect as running the transactions serially, one after another, in sequence, with no overlap in executing any two of them. Such an execution is called serializable and this gives each user the easy-to-understand illusion that...
Chapter
A business process is a set of related tasks that lead to a particular goal. Some business processes automate the execution or tracking of tasks using software. The term workflow is a commonly used synonym for the concept of a business process. The term business transaction is sometimes used as a synonym for a business process or a step within a bu...
Chapter
Transactional middleware products meet the requirements of multitier transaction processing (TP) applications. Twenty years ago, transactional middleware was delivered to market as a single product category, the TP (or OLTP) monitor. Many of these products are still in production, but the most popular transactional middleware environments are now d...
Chapter
A transaction processing (TP) application is a serial processor of requests. It is a server that appears to execute an infinite loop whose body is an ACID (atomicity, consistency, isolation, durability) transaction. The processing of simple requests involves receiving a request, routing it to the appropriate application program, and then executing...
Chapter
This chapter covers major software abstractions needed to make it easy to build reliable transaction processing (TP) applications with good performance: transaction bracketing, threads, remote procedure calls, state management, and scalability techniques. Transaction bracketing offers the programmer commands to start, commit, and abort a transactio...
Chapter
Queued transaction processing (TP) is an alternative to direct TP that uses a persistent queue between client and server programs. The client enqueues requests and dequeues replies. The server dequeues a request, processes the request, enqueues a reply, and commits; if the transaction aborts, the request is replaced in the queue and can be retried....
Chapter
This chapter provides an overview of transaction processing application and system structure. A transaction is the execution of a program that performs an administrative function by accessing a shared database. Transactions can execute online, while a user is waiting, or off-line (in batch mode) if the execution takes longer than a user can wait fo...
Article
Full-text available
We address the problem of unsupervised matching of schema information from a large number of data sources into the schema of a data warehouse. The matching process is the first step of a framework to integrate data feeds from third- party data providers into a structured-search engine's data warehouse. Our experiments show that traditional schema-...
Article
Full-text available
Developers need to programmatically access persistent XML data. Object-oriented access is often the preferred method. Translating XML data into objects or vice-versa is a hard problem due to the data model mismatch and the difficulty of query translation. We propose a framework that addresses this problem by transforming object-based queries and up...
Article
Full-text available
Many of the largest database-driven web sites use custom webscale data managers (WDMs). On the surface, these WDMs are being applied to problems that are well-suited for relational database systems. Some examples are the following: � Map-Reduce [5], Hadoop [7], and Dryad [9] are used to process queries on large data sets using sequential scan and a...
Article
Full-text available
Many of the largest database-driven web sites use custom web-scale data managers (WDMs). On the surface, these WDMs are being applied to problems that are well-suited for relational database systems. Some examples are the following: • Map-Reduce [5], Hadoop [7], and Dryad [9] are used to process queries on large data sets using sequential scan and...
Article
Full-text available
A group of database researchers, architects, users, and pundits met in May 2008 at the Claremont Resort in Berkeley, CA, to discuss the state of database research and its effects on practice. This was the seventh meeting of this sort over the past 20 years and was distinguished by a broad consensus that the database community is at a turning point...
Conference Paper
A model is a formal description of a complex application artifact, such as a database schema, an application interface, a UML model, an ontology, or a message format. The problem of merging such models lies at the core of many meta data applications, such as view integration, mediated schema creation for data integration, and ontology merging. This...
Article
Full-text available
Translating data and data access operations between applications and databases is a longstanding data management problem. We present a novel approach to this problem, in which the relationship between the application data and the persistent storage is specified using a declarative mapping, which is compiled into bidirectional views that drive the d...
Article
Full-text available
We discuss a proposal for the implementation of the model management operator ModelGen, which translates schemas from one model to another, for example from object-oriented to SQL or from SQL to XML schema descriptions. The operator can be used to generate database wrappers (e.g., object-oriented or XML to relational), default user interfaces (e.g....
Article
At the 2008 Computing Research Association Conference at Snowbird, the authors participated in a panel addressing the issue of paper and proposal reviews. This short paper summarizes the panelists' presentations and audience commentary. It concludes with some observations and suggestions on how we might address this issue in the near-term future.
Article
Full-text available
Software integration problems are solved, information integration tools used in practice are described, core technologies of integration tools are reviewed, and future integration trends are identified. The solution of an integration problem is provided by programs aligning data instances, as data formats of the extracted text are identical to thos...
Article
Full-text available
Developers need to access persistent XML data programmatically. Object-oriented access is often the preferred method. Translating XML data into objects or vice-versa is a hard problem due to the data model mismatch and the difficulty of query translation. Our prototype addresses this problem by transforming object-based queries and updates into que...
Conference Paper
Full-text available
Model management is a high-level programming language designed to efficiently manipulate schemas and mappings. It is comprised of robust operators that combined in short programs can solve complex metadata-oriented problems in a compact way. For instance, countless enterprise data integration scenarios can be easily expressed in this high-level lan...
Conference Paper
Full-text available
We address the problem of generating a mediated schema from a set of relational data source schemas and conjunctive queries that specify where those schemas overlap. Unlike past approaches that generate only the mediated schema, our algorithm also generates view definitions , i.e., source-to-mediated schema mappings. Our main goal is to understand...
Article
This paper presents the first object-oriented interfaces that capture the essence of the structural complexity of XML Schema. We develop two such interfaces: a lightweight object-oriented interface that hides some of the complexity of XML Schema by simplifying the particle and type hierarchies, and a more complete but more complex interface that co...
Conference Paper
Full-text available
This paper describes a rule-based algorithm to derive a relational schema from an extended entity-relationship model. Our work is based on an approach by Atzeni and Torlone in which the source EER model is imported into a universal metamodel, a series of transformations are performed to eliminate constructs not appearing in the relational metamodel...
Article
To analyze the comparison, through their results, of two distinct approaches applied to aligning two representations of anatomy. Both approaches use a combination of lexical and structural techniques. In addition, the first approach takes advantage of domain knowledge, while the second approach treats alignment as a special case of schema matching....
Article
Full-text available
ANSI SQL-92 defines Isolation Levels in terms of phenomena: Dirty Reads, Non-Repeatable Reads, and Phantoms. This paper shows that these phenomena and the ANSI SQL definitions fail to characterize several popular isolation levels, including the standard locking implementations of the levels. Investigating the ambiguities of the phenomena leads to c...
Conference Paper
Full-text available
Model management is a generic approach to solving problems of data programmability where precisely engineered mappings are required. Applications include data warehousing, e-commerce, object-to-relational wrappers, enterprise information integration, database portals, and report generators. The goal is to develop a model management engine that can...
Conference Paper
Full-text available
Translating data and data access operations between applications and databases is a longstanding data management problem. We present a novel approach to this problem, in which the relationship between the application data and the persistent storage is specified using a declarative mapping, which is compiled into bidirectional views that drive the d...
Conference Paper
We present an overview of a tutorial on model management—an approach to solving data integration problems, such as data ware- housing, e-commerce, object-to-relational mapping, schema evo- lution and enterprise information integration. Model management defines a small set of operations for manipulating schemas and mappings, such as Match, Compose,...
Article
Full-text available
We briefly motivate and present a new online bibliog- raphy on schema evolution, an area which has recently gained much interest in both research and practice.
Article
Full-text available
Mapping composition is a fundamental operation in metadata driven applications. Given a mapping over schemas S1 and S2 and a mapping over schemas S2 and S3, the composition problem is to compute an equivalent mapping over S1 and S3. We describe a new composition algorithm that targets practical applications. It incorporates view unfolding. It elimi...
Conference Paper
Full-text available
We describe MIDST, an implementation of the model management operator ModelGen, which translates schemas from one model to another, for example from OO to SQL or from SQL to XSD. It extends past approaches by translating database instances, not just their schemas. The operator can be used to generate database wrappers (e.g. OO or XML to relational)...
Article
Full-text available
This paper discusses technical problems that arise in supporting large-scale 24×7 web services based on experience at MSN with Windows Live T M services. Issues covered include multi-tier architecture, costs of commodity vs. premium servers, managing replicas, managing sessions, use of materialized views, and controlling checkpointing. We finish wi...
Conference Paper
Full-text available
Many applications, such as e-commerce, routinely use copies of data that are not in sync with the database due to heuristic caching strategies used to enhance performance. We study concurrency control for a transactional model that allows update transactions to read out-of-date copies. Each read operation carries a "fresh- ness constraint" that spe...
Conference Paper
Full-text available
The goal of schema matching is to identify correspondences be- tween the elements of two schemas. Most schema matching sys- tems calculate and display the entire set of correspondences in a single shot. Invariably, the result presented to the engineer includes many false positives, especially for large schemas. The user is of- ten overwhelmed by al...
Article
This paper defines a collection of metrics on manuscript reviewing and presents historical data for ACM Transactions on Database Systems and The VLDB Journal.
Conference Paper
Full-text available
We discuss the main features of a multilevel dictionary based on a metamodel approach. The application is an implementation of Mod- elGen, the model management operator that translates schemas from one model to another, for example from ER to relational or from XSD to ob- ject. The dictionary manages schemas and, at a metalevel, a description of th...
Conference Paper
Schema matching is the problem of identifying corresponding elements in different schemas. Discovering these correspondences or matches is inherently difficult to automate. Past solutions have proposed a principled combination of multiple algorithms. However, these solutions sometimes perform rather poorly due to the lack of sufficient evidence in...
Conference Paper
A customizable and extensible tool is proposed to implement ModelGen, the model management operator that translates a schema from one model to another. A wide family of models is handled, by using a metamodel in which models can be succinctly and precisely described. The approach is novel because the tool exposes the dictionary that stores models,...
Article
Full-text available
The database research with focus on integration of text, data, code, fusion of information from heterogeneous data sources, and information privacy, conducted at Lowell, is discussed. The object-oriented (OO) and object-relational (OR) database management systems (DBMS) showed how text and other data types can be added to a DBMS. Several goals ment...
Conference Paper
Full-text available
Model management is an approach to simplify the programming of metadata-intensive applications. It offers developers powerful operators, such as Compose, Diff, and Merge, that are applied to models, such as database schemas or interface specifications, and to mappings between models. Prior model management solutions focused on a simple class of map...
Conference Paper
Full-text available
This paper is a short introduction to an industrial session on the use of meta data to address data integration problems in large enterprises. The main topics are data discovery, version and configuration management, and mapping development.
Conference Paper
Full-text available
We demonstrate a prototype that translates schemas from a source metamodel (e.g., OO, relational, XML) to a target metamodel. The prototype is integrated with Microsoft Visual Studio 2005 to generate relational schemas from an object-oriented design. It has four novel features. First, it produces instance mappings to round-trip the data between the...
Conference Paper
Full-text available
Composition of mappings between schemas is essential to support schema evolution, data exchange, data integration, and other data management tasks. In many applications, mappings are given by embedded dependencies. In this article, we study the issues involved in composing such mappings. Our algorithms and results extend those of Fagin et al. [2004...