About
39
Publications
7,308
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,984
Citations
Introduction
Current institution
Publications
Publications (39)
The requirements of Internet of Things (IoT) workloads are unique in the database space. While significant effort has been spent over the last decade rearchitecting OLTP and Analytics workloads for the public cloud, little has been done to rearchitect IoT workloads for the cloud. In this paper we present IBM Db2 Event Store ™ , a cloud-native datab...
In a classic transactional distributed database management system (DBMS), write transactions invariably synchronize with a coordinator before final commitment. While enforcing serializability, this model has long been criticized for not satisfying the applications' availability requirements. When entering the era of Internet of Things (IoT), this p...
Database administrators construct secondary indexes on data tables to accelerate query processing in relational database management systems (RDBMSs). These indexes are built on top of the most frequently queried columns according to the data statistics. Unfortunately, maintaining multiple secondary indexes in the same database can be extremely spac...
Database administrators construct secondary indexes on data tables to accelerate query processing in relational database management systems (RDBMSs). These indexes are built on top of the most frequently queried columns according to the data statistics. Unfortunately, maintaining multiple secondary indexes in the same database can be extremely spac...
The rising demands of real-time analytics have emphasized the need for Hybrid Transactional and Analytical Processing (HTAP) systems, which can handle both fast transactions and analyt-ics concurrently. Wildfire is such a large-scale HTAP system prototyped at IBM Research-Almaden, with many techniques developed in this project incorporated into the...
Database administrators construct secondary indexes on data tables to accelerate query processing in relational database management systems (RDBMSs). These indexes are built on top of the most frequently queried columns according to the data statistics. Unfortunately, maintaining multiple secondary indexes in the same database can be extremely spac...
We demonstrate Hybrid Transactional and Analytics Processing (HTAP) on the Spark platform by the Wildfire prototype, which can ingest up to ~6 million inserts per second per node and simultaneously perform complex SQL analytics queries. Here, a simplified mobile application uses Wildfire to recommend advertising to mobile customers based upon their...
Although the DRAM for main memories of systems continues to grow exponentially according to Moore's Law and to become less expensive, we argue that memory hierarchies will always exist for many reasons, both economic and practical, and in particular due to concurrent users competing for working memory to perform joins and grouping. We present the i...
We present new hash tables for joins, and a hash join based on them, that consumes far less memory and is usually faster than recently published in-memory joins. Our hash join is not restricted to outer tables that fit wholly in memory. Key to this hash join is a new concise hash table (CHT), a linear probing hash table that has 100% fill factor, a...
Embodiments of the present invention provide query processing for column stores by accumulating table record attributes during application of query plan operators on a table. The attributes and associated attribute values are compacted when said attribute values are to be consumed for an operation in the query plan, during the execution of the quer...
Compression has historically been used to reduce the cost of storage, I/Os from that storage, and buffer pool utilization, at the expense of the CPU required to decompress data every time it is queried. However, significant additional CPU efficiencies can be achieved by deferring decompression as late in query processing as possible and performing...
Techniques are disclosed for synchronizing a primary data system with an auxiliary data system that processes data for the primary data system. In one embodiment, how current the primary data system and the auxiliary data system are is determined. Requests sent from the primary data system that were not processed by the auxiliary data system are de...
Provided are techniques for partitioning a physical index into one or more physical partitions; assigning each of the one or more physical partitions to a node in a cluster of nodes; for each received document, assigning an assigned-doc-ID comprising an integer document identifier; and, in response to assigning the assigned-doc-ID to a document, de...
In data centers today, servers are stationary and data flows on a hierarchical network of switches and routers. But such static server arrangements require very scalable networks, and many applications are bottlenecked by network bandwidth. In addition, server density is kept low to enable maintenance and upgrades, as well as to increase air flow....
DB2 with BLU Acceleration deeply integrates innovative new techniques for defining and processing column-organized tables that speed read-mostly Business Intelligence queries by 10 to 50 times and improve compression by 3 to 10 times, compared to traditional row-organized tables, without the complexity of defining indexes or materialized views on t...
The Blink project’s ambitious goal is to answer all Business Intelligence (BI) queries in mere seconds,
regardless of the database size, with an extremely low total cost of ownership. Blink is a new DBMS
aimed primarily at read-mostly BI query processing that exploits scale-out of commodity multi-core
processors and cheap DRAM to retain a (copy of...
The Blink project's ambitious goals are to answer all Business Intelligence (BI) queries in mere seconds, regardless of the database size, with an extremely low total cost of ownership. It takes a very innovative and counter-intuitive approach to processing BI queries, one that exploits several disruptive hardware and software technology trends. Sp...
In this paper, we first introduce the database aspects of the groupware product Lotus Domino/Notes and then describe, in some more detail, many of the logging and recovery enhancements that were introduced in R5. We discuss briefly some of the changes that had to be made to the ARIES recovery method to accommodate the unique storage management char...
Samet introduced a notion of hypothetical knowledge and showed how it could be used to capture the type of counterfactual reasoning necessary to force the backwards induction solution in a game of perfect information. He argued that while ...
Advances in technologies for scanning, networking, and CD-ROM, lower prices for large disk storage, and acceptance of common image compression and file formats have contributed to an increase in the number, size, and uses of on-line image collections. New tools are needed to help users create, manage, and retrieve images from these collections. We...
On-line collections of images are growing larger and more common,
and tools are needed to efficiently manage, organize, and navigate
through them. The authors have developed a prototype system called QBIC
which allows complex multi-object and multi-feature queries of large
image databases. The queries are based on image content-the colors,
textures...
We describe how the QBIC (Query By Image Content) system handles
“multi-*” queries-queries on large image collections
involving multifeatures of each image as a whole and of multiple objects
within each image. The queries are based on properties of image
content-such as colors, textures, shapes, and edges. The system computes
a set of features to d...
In the QBIC (Query By Image Content) project we are studying methods to query large on-line image databases using the images'' content as the basis of the queries. Examples of the content we use include color, texture, shape, position, and dominant edges of image objects and regions. Potential applications include medical (Give me other images that...
IBM Almaden Research Center's project on Query By Image Content
(QBIC) is studying means to retrieve images from large image databases
using image contents such as color, texture, shape and layout. In this
paper, we describe the beta version of the PC-based Ultimedia Manager
product, which is based on QBIC technology. We outline the product
philoso...
In the query by image content (QBIC) project we are studying methods to query large on-line image databases using the images' content as the basis of the queries. Examples of the content we use include color, texture, and shape of image objects and regions. Potential applications include medical (`Give me other images that contain a tumor with a te...
The purpose of our work is to outline objects on images in an interactive environment. We use an improved method based on energy minimizing active contours or `snakes.' Kass et al., proposed a variational technique; Amini used dynamic programming; and Williams and Shah introduced a fast, greedy algorithm. We combine the advantages of the latter two...
The QBIC (query by image content) project in the IBM Almaden
Research Center in San Jose, CA, is conducting a theoretical,
experimental, and prototyping study of the problem of querying large
still image databases efficiently based on image content. Since the
problem is difficult, the aim is to discover general principles, but at
the same time to i...
The purpose of our work is to outline objects on images in an interactive environment. We use an improved method based on energy minimizing active contours or `snakes.' Kass et al., proposed a variational technique; Amini used dynamic programming; and Williams and Shah introduced a fast, greedy algorithm. We combine the advantages of the latter two...
In the query by image content (QBIC) project we are studying methods to
query large on-line image databases using the images' content as the
basis of the queries. Examples of the content we use include color,
texture, and shape of image objects and regions. Potential applications
include medical (`Give me other images that contain a tumor with a
te...