About
67
Publications
13,888
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,416
Citations
Introduction
Skills and Expertise
Current institution
Publications
Publications (67)
Tuning a database system to achieve optimal performance on a given workload is a long-standing problem in the database community. A number of recent papers have leveraged ML-based approaches to guide the sampling of large parameter spaces (hundreds of tuning knobs) in search for high performance configurations. Looking at Microsoft production servi...
Microsoft's internal big-data infrastructure is one of the largest in the world -- with over 300k machines running billions of tasks from over 0.6M daily jobs. Operating this infrastructure is a costly and complex endeavor, and efficiency is paramount. In fact, for over 15 years, a dedicated engineering team has tuned almost every aspect of this in...
2020 IEEE. Graphs are a natural way to model real-world entities and relationships between them, ranging from social networks to data lineage graphs and biological datasets. Queries over these large graphs often involve expensive sub-graph traversals and complex analytical computations. These real-world graphs are often substantially more structure...
Resource Managers like YARN and Mesos have emerged as a critical layer in the cloud computing system stack, but the developer abstractions for leasing cluster resources and instantiating application logic are very low level. This flexibility comes at a high cost in terms of developer effort, as each application must repeatedly tackle the same chall...
Latency to end-users and regulatory requirements push large companies to build data centers all around the world. The resulting data is "born" geographically distributed. On the other hand, many machine learning applications require a global view of such data in order to achieve the best results. These types of applications form a new class of lear...
The broad success of Hadoop has led to a fast-evolving and diverse ecosystem of application engines that are building upon the YARN resource management layer. The open-source implementation of MapReduce is being slowly replaced by a collection of engines dedicated to specific verticals. This has led to growing fragmentation and repeated efforts wit...
Benchmarking is an essential activity when choosing database products, tuning systems, and understanding the trade-offs of the underlying engines. But the workloads available for this effort are often restrictive and non-representative of the ever changing requirements of the modern database applications. We recently introduced OLTP-Bench, an exten...
Many large organizations collect massive volumes of data each day in a geographically distributed fashion, at data centers around the globe. Despite their geographically diverse origin the data must be processed and analyzed as a whole to extract insight. We call the problem of supporting large-scale geo-distributed analytics Wide-Area Big Data (WA...
Resource Managers like Apache YARN have emerged as a critical layer in the cloud computing system stack, but the developer abstractions for leasing cluster resources and instantiating application logic are very low-level. This flexibility comes at a high cost in terms of developer effort, as each application must repeatedly tackle the same challeng...
The initial design of Apache Hadoop [1] was tightly focused on running massive, MapReduce jobs to process a web crawl. For increasingly diverse companies, Hadoop has become the data and computational agorá---the de facto place where data and computational resources are shared and accessed. This broad adoption and ubiquitous usage has stretched the...
In this demo proposal, we describe REEF, a framework that makes it easy to implement scalable, fault-tolerant runtime environments for a range of computational models. We will demonstrate diverse workloads, including extract-transform-load MapReduce jobs, iterative machine learning algorithms, and ad-hoc declarative query processing. At its core, R...
Search, exploration and social experience on the Web has recently undergone tremendous changes with search engines, web portals and social networks offering a different perspective on information discovery and consumption. This new perspective is aimed at capturing user intents, and providing richer and highly connected experiences. The new battleg...
Advances in operating system and storage-level virtualization technologies have enabled the effective consolidation of heterogeneous applications in a shared cloud infrastructure. Novel research challenges arising from this new shared environment include load balancing, workload estimation, resource isolation, machine replication, live migration, a...
Database administrators of Online Transaction Processing (OLTP) systems constantly face difficult questions. For example, "What is the maximum throughput I can sustain with my current hardware?", "How much disk I/O will my system perform if the requests per second double?", or "What will happen if the ratio of transactions in my system changes?". R...
Supporting database schema evolution represents a long-standing challenge of practical and theoretical importance for modern information systems. In this paper, we describe techniques and systems for automating the critical tasks of migrating the database and rewriting the legacy applications. In addition to labor saving, the benefits delivered by...
Database administrators of Online Transaction Processing (OLTP) systems constantly face difcult questions. For example, "What is the maximum throughput I can sustain with my current hardware?", "How much disk I/O will my system perform if the requests per second double?", or "What will happen if the ratio of transactions in my system changes?". Res...
Mobile application development is challenging for several reasons: intermittent and limited network connectivity, tight power constraints, server-side scalability concerns, and a number of fault-tolerance issues. Developers handcraft complex solutions that include client-side caching, conflict resolution, disconnection tolerance, and backend databa...
The advent of affordable, shared-nothing computing systems portends a new class of parallel database management systems (DBMS) for on-line transaction processing (OLTP) applications that scale without sacrificing ACID guarantees [7, 9]. The performance of these DBMSs is predicated on the existence of an optimal database design that is tailored for...
The standard way to get linear scaling in a distributed OLTP DBMS is to horizontally partition data across several nodes. Ideally, this partitioning will result in each query being executed at just one node, to avoid the overheads of distributed transactions and allow nodes to be added without increasing the amount of required coordination. For som...
The standard way to get linear scaling in a distributed OLTP DBMS is to horizontally partition data across several nodes. Ideally, this partitioning will result in each query being executed at just one node, to avoid the overheads of distributed transactions and allow nodes to be added without increasing the amount of required coordination. For som...
One of the key tenets of database system design is making efficient
use of storage and memory resources. However, existing database
system implementations are actually extremely wasteful of such
resources; for example, most systems leave a great deal of empty
space in tuples, index pages, and data pages, and spend many
CPU cycles reading cold recor...
This paper introduces a new transactional “database-as-a-service”
(DBaaS) called Relational Cloud. A DBaaS promises to move
much of the operational burden of provisioning, configuration, scaling,
performance tuning, backup, privacy, and access control from
the database users to the service operator, offering lower overall
costs to users. Early DBaa...
In most enterprises, databases are deployed on dedicated database servers. Often, these servers are underutilized much of the time. For example, in traces from almost 200 production servers from different organizations, we see an average CPU utilization of less than 4%. This unused capacity can be potentially harnessed to consolidate multiple datab...
Supporting legacy applications when the database schema evolves represents a long-standing challenge of practical and theoretical importance. Recent work has produced algorithms and systems that automate the process of data migration and query adaptation; however, the problems of evolving integrity constraints and supporting legacy updates under sc...
We present Schism, a novel workload-aware approach for database partitioning and replication designed to improve scalability of shared-nothing distributed databases. Because distributed transactions are expensive in OLTP settings (a fact we demonstrate through a series of experiments), our partitioner attempts to minimize the number of distributed...
The problem of archiving and querying the history of a database is made more complex by the fact that, along with the database content, the database schema also evolves with time. Indeed, archival quality can only be guaranteed by storing past database contents using the schema versions under which they were originally created. This causes major us...
In this paper, we make the case for â databases as a serviceâ (DaaS), with two target scenarios in mind: (i) consolidation of data management functionality for large organizations and (ii) outsourcing data management to a cloud-based service provider for small/medium organizations. We analyze the many challenges to be faced, and discuss the design...
The inter-relationship between the data and context is discussed. The context is perceived as a set of variables that may be of interest for an agent and that influence its actions. The sophisticated and general context models have been proposed to support context-aware applications in the last few years. Context is attributed with different type o...
Relational databases have been designed to store high volumes of data and to provide an efficient query interface. Ontologies are geared towards capturing domain knowledge, annotations, and to offer high-level, machine-processable views of data and metadata. The complementary strengths and weaknesses of these data models motivate the research effor...
Schema evolution poses serious challenges in historical data management. Traditionally, historical data have been archived either by (i) migrating them into the current schema version that is well-understood by users but compromising archival quality, or (ii) by maintaining them under the original schema version in which the data was originally cre...
More and more often, we face the necessity of extracting appropriately reshaped knowledge from an integrated representation of the information space. Be such a global representation a central database, a global view of several ones or an ontological representation of an information domain, we face the need to define personalised views for the knowl...
Information systems are subject to a perpetual evolution, which is particularly pressing in Web information systems, due to their distributed and often collaborative nature. Such continuous adaptation process, comes with a very high cost, because of the intrinsic complexity of the task and the serious ramifications of such changes upon database-cen...
The complexity, cost, and down-time currently created by the database schema evolution process is the source of incessant problems in the life of information systems and a major stumbling block that prevent graceful upgrades. Furthermore, our studies shows that the serious problems encountered by traditional information systems are now further exac...
The Semantic Web has the ambitious goal of enabling complex autonomous applications to reason on a machine-processable version of the World Wide Web. This, however, would require a coordinated effort not easily achievable in practice. On the other hand, spontaneous communities, based on social tagging, recently achieved noticeable consensus and dif...
Modern information systems, and web information systems in particular, are faced with frequent database schema changes, which
generate the necessity to manage them and preserve the schema evolution history. In this paper, we describe the Panta Rhei Framework designed to provide powerful tools that: (i) facilitate schema evolution and guide the Data...
The old problem of managing the history of database information is now made more urgent and complex by fast spreading web information systems, such as Wikipedia. Our PRIMA system addresses this difficult problem by introducing two key pieces of new technology. The first is a method for publishing the history of a relational database in XML, whereby...
Supporting graceful schema evolution represents an unsolved problem for traditional information systems that is further exacerbated in web information systems, such as Wikipedia and public scientific databases: in these projects based on multiparty cooperation the frequency of database schema changes has increased while tolerance for downtimes has...
Independent, heterogeneous, distributed, sometimes tran- sient and mobile data sources produce an enormous amount of information that should be semantically inte- grated and filtered, or, as we say, tailored, based on the users' interests and context. We propose to exploit knowl- edge about the user, the adopted device, and the environ- ment - alto...
Evolving the database that is at the core of an Information System represents a difficult maintenance problem that has only been studied in the framework of traditional information systems. However, the problem is likely to be even more severe in web information systems, where open-source software is often developed through the contributions and co...
Complex design, targeting system-on-chip based on reconfigurable architectures, still lacks a generalized methodology allowing both the automatic derivation of a complete system solution able to fit into the final device, and mixed hardware-software solutions, exploiting partial reconfiguration capabilities. The shining methodology organizes the in...
The life of a modern Information System is often char-acterized by (i) a push toward integration with other sys-tems, and (ii) the evolution of its data management core in response to continuously changing application require-ments. Most of the current proposals dealing with these is-sues from a database perspective rely on the formal notions of ma...
Context-aware systems are pervading everyday life, there fore context modeling is becoming a relevant issue and an expanding research field. This survey has the goal to pro vide a comprehensive evaluation framework, allowing application designers to compare context models with respect to a given target application; in particular we stress the analy...
System interoperability is a well known issue, especially for heterogeneous information systems, where ontology- based representations may support automatic and user- transparent integration. In this paper we present X-SOM: an ontology mapping and integration tool. The contribution of our tool is a modular and extensible architecture that automatic...
Independent, heterogeneous, distributed, sometimes transient and mobile data sources produce an enormous amount of information that should be semantically integrated and filtered, or, as we say, tailored, based on the users' interests and context. We propose to exploit knowledge about the user, the adopted device, and the environment - altogether c...
Nowadays user mobility requires that both content and services be appropriately personalized, in order for the (mobile) user to be always - and anywhere - equipped with the adequate share of data. Thus, the knowledge about the user, the adopted device and the environment, altogether called context, has to be taken into account in order to minimize...
Sommario Questo documento costituisce il deliverable DALL2 del progetto Esteem e ha lo scopo di il-lustrare l'architettura complessiva di un peer Esteem. Tale architetturà e stata sviluppata nel corso del progetto con il contributo di tutti i partner e presenta una struttura modulare. In particolare, nel documento, viene fornita una descrizione dei...
Very often we face the need of extracting appropriate data views from an integrated representation of
the information space, and of defining, a posteriori, personalized views for the information stakeholders.
The peer to peer scenario of the ESTEEM project introduces another interesting motivation behind
the desire to define customized data views o...
Independent, heterogeneous, distributed, sometimes transient and mobile data sources produce an enormous amount of information that should be semantically integrated and filtered, or, as we say, tailored, based on the users interests and context. Since both the user and the data sources can be mobile, and the communication might be unreliable, cac...
Current applications are often forced to filter the richness of datasources in order to reduce the information noise the user is subject to. We consider this aspect as a critical issue of applications, to be factorized at the data management level. The Context-ADDICT system, leveraging on ontology-based context and domain models, is able to persona...
In this paper we describe TinyLime, a novel middleware for wireless sensor networks that departs from the traditional setting where sensor data is collected by a central monitoring station, and enables instead multiple mobile monitoring stations to access the sensors in their proximity and share the collected data through wireless links. This intri...
We consider the problem of finding officially unrecognized side effects of drugs. By submitting queries to the Web involving a given drug name, it is possible to retrieve pages concerning the drug. However, many retrieved pages are irrelevant and some relevant pages are not retrieved. More relevant pages can be obtained by adding the active ingredi...
In the rapidly developing field of sensor networks, bridging the gap between the applications and the hardware presents a major challenge. Although middleware is one solution, it must be specialized to the qualities of sensor networks, especially energy consumption. The work presented here provides two contributions: a new operational setting for s...
Very Small DataBases (VSDB) is a methodology and a com- plete framework for database design and management in a complex en- vironment where databases are distributed over dierent systems, from high-end servers to reduced-power portable devices. Within this frame- work the architecture of PoLiDBMS, a Portable Light Database Man- agement System has b...
Data integration is an old but still open issue in the database research area, where Semantic Web technologies, such as ontologies, may be of great help. Aim of the Context-ADDICT project is to provide support for the integration and context-aware reshaping of data coming from heterogeneous data sources. Within this framework, we use ontology extra...
Il debuggin e una delleattivi apì u onerose nel processo di sviluppo del software; in particolare il compitopì u arduo ed imprevedibile in termini di tempò e quello di isolare la sorgente del problema. Il Delta Debuggin e una tecnica innovativa, sistematica ed automatica, che fornisce una so-lida base teorica per affrontare proprio questo compito;...
Research and Education have been often perceived as a di-chotomy. It has often been hard to couple them in a pro-ductive and virtuous cycle. With this paper we would like to discuss our attempt in this direction, briefly presenting the approach and the positive results obtained. The key idea is involving students, by means of projects and theses, i...