Carlo A Curino

Carlo A Curino
  • Microsoft

About

67
Publications
13,888
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,416
Citations
Current institution
Microsoft

Publications

Publications (67)
Preprint
Tuning a database system to achieve optimal performance on a given workload is a long-standing problem in the database community. A number of recent papers have leveraged ML-based approaches to guide the sampling of large parameter spaces (hundreds of tuning knobs) in search for high performance configurations. Looking at Microsoft production servi...
Preprint
Full-text available
Microsoft's internal big-data infrastructure is one of the largest in the world -- with over 300k machines running billions of tasks from over 0.6M daily jobs. Operating this infrastructure is a costly and complex endeavor, and efficiency is paramount. In fact, for over 15 years, a dedicated engineering team has tuned almost every aspect of this in...
Article
2020 IEEE. Graphs are a natural way to model real-world entities and relationships between them, ranging from social networks to data lineage graphs and biological datasets. Queries over these large graphs often involve expensive sub-graph traversals and complex analytical computations. These real-world graphs are often substantially more structure...
Article
Full-text available
Resource Managers like YARN and Mesos have emerged as a critical layer in the cloud computing system stack, but the developer abstractions for leasing cluster resources and instantiating application logic are very low level. This flexibility comes at a high cost in terms of developer effort, as each application must repeatedly tackle the same chall...
Article
Latency to end-users and regulatory requirements push large companies to build data centers all around the world. The resulting data is "born" geographically distributed. On the other hand, many machine learning applications require a global view of such data in order to achieve the best results. These types of applications form a new class of lear...
Conference Paper
The broad success of Hadoop has led to a fast-evolving and diverse ecosystem of application engines that are building upon the YARN resource management layer. The open-source implementation of MapReduce is being slowly replaced by a collection of engines dedicated to specific verticals. This has led to growing fragmentation and repeated efforts wit...
Conference Paper
Full-text available
Benchmarking is an essential activity when choosing database products, tuning systems, and understanding the trade-offs of the underlying engines. But the workloads available for this effort are often restrictive and non-representative of the ever changing requirements of the modern database applications. We recently introduced OLTP-Bench, an exten...
Conference Paper
Many large organizations collect massive volumes of data each day in a geographically distributed fashion, at data centers around the globe. Despite their geographically diverse origin the data must be processed and analyzed as a whole to extract insight. We call the problem of supporting large-scale geo-distributed analytics Wide-Area Big Data (WA...
Conference Paper
Full-text available
Resource Managers like Apache YARN have emerged as a critical layer in the cloud computing system stack, but the developer abstractions for leasing cluster resources and instantiating application logic are very low-level. This flexibility comes at a high cost in terms of developer effort, as each application must repeatedly tackle the same challeng...
Conference Paper
The initial design of Apache Hadoop [1] was tightly focused on running massive, MapReduce jobs to process a web crawl. For increasingly diverse companies, Hadoop has become the data and computational agorá---the de facto place where data and computational resources are shared and accessed. This broad adoption and ubiquitous usage has stretched the...
Article
In this demo proposal, we describe REEF, a framework that makes it easy to implement scalable, fault-tolerant runtime environments for a range of computational models. We will demonstrate diverse workloads, including extract-transform-load MapReduce jobs, iterative machine learning algorithms, and ad-hoc declarative query processing. At its core, R...
Article
Search, exploration and social experience on the Web has recently undergone tremendous changes with search engines, web portals and social networks offering a different perspective on information discovery and consumption. This new perspective is aimed at capturing user intents, and providing richer and highly connected experiences. The new battleg...
Article
Advances in operating system and storage-level virtualization technologies have enabled the effective consolidation of heterogeneous applications in a shared cloud infrastructure. Novel research challenges arising from this new shared environment include load balancing, workload estimation, resource isolation, machine replication, live migration, a...
Conference Paper
Full-text available
Database administrators of Online Transaction Processing (OLTP) systems constantly face difficult questions. For example, "What is the maximum throughput I can sustain with my current hardware?", "How much disk I/O will my system perform if the requests per second double?", or "What will happen if the ratio of transactions in my system changes?". R...
Article
Supporting database schema evolution represents a long-standing challenge of practical and theoretical importance for modern information systems. In this paper, we describe techniques and systems for automating the critical tasks of migrating the database and rewriting the legacy applications. In addition to labor saving, the benefits delivered by...
Article
Database administrators of Online Transaction Processing (OLTP) systems constantly face difcult questions. For example, "What is the maximum throughput I can sustain with my current hardware?", "How much disk I/O will my system perform if the requests per second double?", or "What will happen if the ratio of transactions in my system changes?". Res...
Article
Mobile application development is challenging for several reasons: intermittent and limited network connectivity, tight power constraints, server-side scalability concerns, and a number of fault-tolerance issues. Developers handcraft complex solutions that include client-side caching, conflict resolution, disconnection tolerance, and backend databa...
Article
The advent of affordable, shared-nothing computing systems portends a new class of parallel database management systems (DBMS) for on-line transaction processing (OLTP) applications that scale without sacrificing ACID guarantees [7, 9]. The performance of these DBMSs is predicated on the existence of an optimal database design that is tailored for...
Conference Paper
The standard way to get linear scaling in a distributed OLTP DBMS is to horizontally partition data across several nodes. Ideally, this partitioning will result in each query being executed at just one node, to avoid the overheads of distributed transactions and allow nodes to be added without increasing the amount of required coordination. For som...
Article
The standard way to get linear scaling in a distributed OLTP DBMS is to horizontally partition data across several nodes. Ideally, this partitioning will result in each query being executed at just one node, to avoid the overheads of distributed transactions and allow nodes to be added without increasing the amount of required coordination. For som...
Conference Paper
One of the key tenets of database system design is making efficient use of storage and memory resources. However, existing database system implementations are actually extremely wasteful of such resources; for example, most systems leave a great deal of empty space in tuples, index pages, and data pages, and spend many CPU cycles reading cold recor...
Conference Paper
This paper introduces a new transactional “database-as-a-service” (DBaaS) called Relational Cloud. A DBaaS promises to move much of the operational burden of provisioning, configuration, scaling, performance tuning, backup, privacy, and access control from the database users to the service operator, offering lower overall costs to users. Early DBaa...
Conference Paper
In most enterprises, databases are deployed on dedicated database servers. Often, these servers are underutilized much of the time. For example, in traces from almost 200 production servers from different organizations, we see an average CPU utilization of less than 4%. This unused capacity can be potentially harnessed to consolidate multiple datab...
Article
Full-text available
Supporting legacy applications when the database schema evolves represents a long-standing challenge of practical and theoretical importance. Recent work has produced algorithms and systems that automate the process of data migration and query adaptation; however, the problems of evolving integrity constraints and supporting legacy updates under sc...
Article
We present Schism, a novel workload-aware approach for database partitioning and replication designed to improve scalability of shared-nothing distributed databases. Because distributed transactions are expensive in OLTP settings (a fact we demonstrate through a series of experiments), our partitioner attempts to minimize the number of distributed...
Conference Paper
Full-text available
The problem of archiving and querying the history of a database is made more complex by the fact that, along with the database content, the database schema also evolves with time. Indeed, archival quality can only be guaranteed by storing past database contents using the schema versions under which they were originally created. This causes major us...
Article
In this paper, we make the case for â databases as a serviceâ (DaaS), with two target scenarios in mind: (i) consolidation of data management functionality for large organizations and (ii) outsourcing data management to a cloud-based service provider for small/medium organizations. We analyze the many challenges to be faced, and discuss the design...
Article
Full-text available
The inter-relationship between the data and context is discussed. The context is perceived as a set of variables that may be of interest for an agent and that influence its actions. The sophisticated and general context models have been proposed to support context-aware applications in the last few years. Context is attributed with different type o...
Conference Paper
Full-text available
Relational databases have been designed to store high volumes of data and to provide an efficient query interface. Ontologies are geared towards capturing domain knowledge, annotations, and to offer high-level, machine-processable views of data and metadata. The complementary strengths and weaknesses of these data models motivate the research effor...
Conference Paper
Full-text available
Schema evolution poses serious challenges in historical data management. Traditionally, historical data have been archived either by (i) migrating them into the current schema version that is well-understood by users but compromising archival quality, or (ii) by maintaining them under the original schema version in which the data was originally cre...
Article
Full-text available
More and more often, we face the necessity of extracting appropriately reshaped knowledge from an integrated representation of the information space. Be such a global representation a central database, a global view of several ones or an ontological representation of an information domain, we face the need to define personalised views for the knowl...
Conference Paper
Information systems are subject to a perpetual evolution, which is particularly pressing in Web information systems, due to their distributed and often collaborative nature. Such continuous adaptation process, comes with a very high cost, because of the intrinsic complexity of the task and the serious ramifications of such changes upon database-cen...
Conference Paper
The complexity, cost, and down-time currently created by the database schema evolution process is the source of incessant problems in the life of information systems and a major stumbling block that prevent graceful upgrades. Furthermore, our studies shows that the serious problems encountered by traditional information systems are now further exac...
Conference Paper
Full-text available
The Semantic Web has the ambitious goal of enabling complex autonomous applications to reason on a machine-processable version of the World Wide Web. This, however, would require a coordinated effort not easily achievable in practice. On the other hand, spontaneous communities, based on social tagging, recently achieved noticeable consensus and dif...
Conference Paper
Full-text available
Modern information systems, and web information systems in particular, are faced with frequent database schema changes, which generate the necessity to manage them and preserve the schema evolution history. In this paper, we describe the Panta Rhei Framework designed to provide powerful tools that: (i) facilitate schema evolution and guide the Data...
Article
Full-text available
The old problem of managing the history of database information is now made more urgent and complex by fast spreading web information systems, such as Wikipedia. Our PRIMA system addresses this difficult problem by introducing two key pieces of new technology. The first is a method for publishing the history of a relational database in XML, whereby...
Article
Supporting graceful schema evolution represents an unsolved problem for traditional information systems that is further exacerbated in web information systems, such as Wikipedia and public scientific databases: in these projects based on multiparty cooperation the frequency of database schema changes has increased while tolerance for downtimes has...
Article
Full-text available
Independent, heterogeneous, distributed, sometimes tran- sient and mobile data sources produce an enormous amount of information that should be semantically inte- grated and filtered, or, as we say, tailored, based on the users' interests and context. We propose to exploit knowl- edge about the user, the adopted device, and the environ- ment - alto...
Conference Paper
Full-text available
Evolving the database that is at the core of an Information System represents a difficult maintenance problem that has only been studied in the framework of traditional information systems. However, the problem is likely to be even more severe in web information systems, where open-source software is often developed through the contributions and co...
Conference Paper
Full-text available
Complex design, targeting system-on-chip based on reconfigurable architectures, still lacks a generalized methodology allowing both the automatic derivation of a complete system solution able to fit into the final device, and mixed hardware-software solutions, exploiting partial reconfiguration capabilities. The shining methodology organizes the in...
Article
Full-text available
The life of a modern Information System is often char-acterized by (i) a push toward integration with other sys-tems, and (ii) the evolution of its data management core in response to continuously changing application require-ments. Most of the current proposals dealing with these is-sues from a database perspective rely on the formal notions of ma...
Article
Full-text available
Context-aware systems are pervading everyday life, there fore context modeling is becoming a relevant issue and an expanding research field. This survey has the goal to pro vide a comprehensive evaluation framework, allowing application designers to compare context models with respect to a given target application; in particular we stress the analy...
Conference Paper
Full-text available
System interoperability is a well known issue, especially for heterogeneous information systems, where ontology- based representations may support automatic and user- transparent integration. In this paper we present X-SOM: an ontology mapping and integration tool. The contribution of our tool is a modular and extensible architecture that automatic...
Conference Paper
Full-text available
Independent, heterogeneous, distributed, sometimes transient and mobile data sources produce an enormous amount of information that should be semantically integrated and filtered, or, as we say, tailored, based on the users' interests and context. We propose to exploit knowledge about the user, the adopted device, and the environment - altogether c...
Conference Paper
Full-text available
Nowadays user mobility requires that both content and services be appropriately personalized, in order for the (mobile) user to be always - and anywhere - equipped with the adequate share of data. Thus, the knowledge about the user, the adopted device and the environment, altogether called context, has to be taken into account in order to minimize...
Article
Full-text available
Sommario Questo documento costituisce il deliverable DALL2 del progetto Esteem e ha lo scopo di il-lustrare l'architettura complessiva di un peer Esteem. Tale architetturà e stata sviluppata nel corso del progetto con il contributo di tutti i partner e presenta una struttura modulare. In particolare, nel documento, viene fornita una descrizione dei...
Article
Full-text available
Very often we face the need of extracting appropriate data views from an integrated representation of the information space, and of defining, a posteriori, personalized views for the information stakeholders. The peer to peer scenario of the ESTEEM project introduces another interesting motivation behind the desire to define customized data views o...
Conference Paper
Full-text available
Independent, heterogeneous, distributed, sometimes transient and mobile data sources produce an enormous amount of information that should be semantically integrated and filtered, or, as we say, tailored, based on the user’s interests and context. Since both the user and the data sources can be mobile, and the communication might be unreliable, cac...
Conference Paper
Full-text available
Current applications are often forced to filter the richness of datasources in order to reduce the information noise the user is subject to. We consider this aspect as a critical issue of applications, to be factorized at the data management level. The Context-ADDICT system, leveraging on ontology-based context and domain models, is able to persona...
Article
In this paper we describe TinyLime, a novel middleware for wireless sensor networks that departs from the traditional setting where sensor data is collected by a central monitoring station, and enables instead multiple mobile monitoring stations to access the sensors in their proximity and share the collected data through wireless links. This intri...
Conference Paper
Full-text available
We consider the problem of finding officially unrecognized side effects of drugs. By submitting queries to the Web involving a given drug name, it is possible to retrieve pages concerning the drug. However, many retrieved pages are irrelevant and some relevant pages are not retrieved. More relevant pages can be obtained by adding the active ingredi...
Conference Paper
In the rapidly developing field of sensor networks, bridging the gap between the applications and the hardware presents a major challenge. Although middleware is one solution, it must be specialized to the qualities of sensor networks, especially energy consumption. The work presented here provides two contributions: a new operational setting for s...
Conference Paper
Full-text available
Very Small DataBases (VSDB) is a methodology and a com- plete framework for database design and management in a complex en- vironment where databases are distributed over dierent systems, from high-end servers to reduced-power portable devices. Within this frame- work the architecture of PoLiDBMS, a Portable Light Database Man- agement System has b...
Article
Full-text available
Data integration is an old but still open issue in the database research area, where Semantic Web technologies, such as ontologies, may be of great help. Aim of the Context-ADDICT project is to provide support for the integration and context-aware reshaping of data coming from heterogeneous data sources. Within this framework, we use ontology extra...
Article
Il debuggin e una delleattivi apì u onerose nel processo di sviluppo del software; in particolare il compitopì u arduo ed imprevedibile in termini di tempò e quello di isolare la sorgente del problema. Il Delta Debuggin e una tecnica innovativa, sistematica ed automatica, che fornisce una so-lida base teorica per affrontare proprio questo compito;...
Article
Full-text available
Research and Education have been often perceived as a di-chotomy. It has often been hard to couple them in a pro-ductive and virtuous cycle. With this paper we would like to discuss our attempt in this direction, briefly presenting the approach and the positive results obtained. The key idea is involving students, by means of projects and theses, i...

Network

Cited By