Jim Gemmell

Jim Gemmell
  • Doctor of Philosophy
  • Managing Director at Ernst & Young

About

67
Publications
23,563
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,804
Citations
Current institution
Ernst & Young
Current position
  • Managing Director

Publications

Publications (67)
Chapter
In 2009, we predicted that life-logging would be become cheap, easy and commonplace by 2020. It is emerging right on schedule. By examining the rising phenomena of life-logging in the context of Moore’s Law and Bell’s Law, one can observe the budding Internet of Things intersecting with life-logging to produce thing-logging. Thing-logging is just b...
Article
Full-text available
This paper explores combinatorial optimization for problems of max-weight graph matching on multi-partite graphs, which arise in integrating multiple data sources. Entity resolution-the data integration problem of performing noisy joins on structured data-typically proceeds by first hashing each record into zero or more blocks, scoring pairs of rec...
Article
Full-text available
We consider a serious, previously-unexplored challenge facing almost all approaches to scaling up entity resolution (ER) to multiple data sources: the prohibitive cost of labeling training data for supervised learning of similarity scores for each pair of sources. While there exists a rich literature describing almost all aspects of pairwise ER, th...
Article
Full-text available
Lives is a system to author and visualize stories based on a collection of biographical and historical multimedia content enhanced with event objects. Stories are constructed as hyperlinked slide-shows, which may also be visualized in a timeline. Besides manually created hyperlinks, Lives also supports the discovery of stories and media that inters...
Article
Full-text available
In practical data integration systems, it is common for the data sources being integrated to provide conflicting information about the same entity. Consequently, a major challenge for data integration is to derive the most complete and accurate integrated records from diverse and sometimes conflicting sources. We term this challenge the truth findi...
Article
Full-text available
Some of the greatest advances in web search have come from leveraging socio-economic properties of online user behavior. Past advances include PageRank, anchor text, hubs-authorities, and TF-IDF. In this paper, we investigate another socio-economic property that, to our knowledge, has not yet been exploited: sites that create lists of entities, suc...
Conference Paper
Full-text available
Planz provides a single, integrative document-like overlay to a folder hierarchy through the dynamic, on- demand assembly of XML fragments. This overlay provides a context in which to create or reference not only files but also email messages, web pages and informal notes. This paper describes an evaluation of Planz over a period of several days du...
Article
Full-text available
New systems may allow people to record everything they see and hear—and even things they cannot sense—and to store all these data in a personal digital archive
Chapter
Full-text available
One of the things that distinguishes human beings from other species is the magnitude to which we manipulate our (largely synthetically created) environments and our technologies in order to augment ourselves physically and mentally. Supporting our individual as well as collective memory has been a particularly important endeavor as we have continu...
Article
The guest editors discuss the CARPE project and elaborate on some of the articles contained in this special issue.
Article
Full-text available
The storage functions in commercial systems encourages people to keep more and more of their memories in digital form. Commercial systems that support information capture include head-worn video capture and the continuous monitoring and recording of physiological informations. The low-cost abundant storage makes it possible to record most life expe...
Article
Full-text available
The January 2001 Communications article “A Personal Digital Store” described our efforts to encode, store, andallow easy access to all of a person’s information for personal and professional use [1]. The goals included understanding the effort to digitize a lifetime of legacy content and the elimination of paper as a permanent storage medium. We us...
Conference Paper
Full-text available
User authored stories will always be the best stories, and authoring tools will continue to be developed. However, a digital lifetime capture permits storytelling via a lightweight markup structure, combined with location, sensor and usage data. In this paper, we describe support in the MyLifeBits system for such an approach, along with some simple...
Article
Full-text available
Passive capture lets people record their experiences without having to operate recording equipment, and without even having to give recording conscious thought. The advantages are increased capture, and improved participation in the event itself. However, passive capture also presents many new challenges. One key challenge is how to deal with the i...
Conference Paper
Full-text available
Within five years, our personal computers with terabyte disk drives will be able to store everything we read, write, hear, and many of the images we see including video. Vannevar Bush outlined such a system in his famous 1945 Memex article [1]. For the last four years we have worked on MyLifeBits www.MyLifeBits.com http://www.MyLifeBits.com, a syst...
Conference Paper
Full-text available
Storage trends have brought us to the point where it is affordable to keep a complete digital record of one's life, and capture methods are multiplying. To experiment with a lifetime store, we are digitizing everything possible from Gordon Bell's life. The MyLifeBits system is designed to store and manage a lifetime's worth of data. MyLifeBits enab...
Article
Full-text available
Pragmatic general multicast (PGM) is a reliable multicast transport protocol that runs over a best effort datagram service, such as IP multicast. PGM obtains scalability via hierarchy, forward error correction, NAK elimination, and NAK suppression. It employs a novel polling scheme for NAK delay tuning to facilitate scaling up and down. This articl...
Article
Full-text available
Storage trends have brought us to the point where it is affordable to keep a complete digital record of one's life. The MyLifeBits system is designed to store and manage a lifetime's worth of data. To experiment with a lifetime store, we have digitized everything possible from Gordon Bell's life. These are added to his existing digital assets. We a...
Article
Full-text available
The home of the future will have an all-digital network for all media, backed by multi-terabyte storage. Users will be able keep an entire lifetime of personal media, and vast collections of media that may be of interest for future viewing, reading, or listening. MyLifeBits is a personal store for a digital life, designed to support efficient organ...
Article
Full-text available
This memo describes the use of Forward Error Correction (FEC) codes to efficiently provide and/or augment reliability for one-to-many reliable data transport using IP multicast. One of the key properties of FEC codes in this context is the ability to use the same packets containing FEC data to simultaneously repair different packet loss patterns at...
Conference Paper
Full-text available
MyLifeBits is a project to fulfill the Memex vision first posited by Vannevar Bush in 1945. It is a system for storing all of one's digital media, including documents, images, sounds, and videos. It is built on four principles: (1) collections and search must replace hierarchy for organization (2) many visualizations should be supported (3) annotat...
Article
Full-text available
This document describes how to use Forward Error Correction (FEC) codes to efficiently provide and/or augment reliability for bulk data transfer over IP multicast. This document defines a framework for the definition of the information that needs to be communicated in order to use an FEC code for bulk data transfer, in addition to the encoded data...
Technical Report
This document generally describes how to use Forward Error Correction (FEC) codes to efficiently provide and/or augment reliability for data transport. The primary focus of this document is the application of FEC codes to one-to-many reliable data transport using IP multicast. This document describes what information is needed to identify a specifi...
Article
Full-text available
Lack of gaze awareness is a key failure that hinders the widespread acceptance of videoconferencing. GazeMaster is a project which attempts to provide a software solution to gaze awareness and eye contact. Previous publications have described the general approach of GazeMaster. This paper describes our implementation experience, giving more details...
Article
Full-text available
MyLifeBits is a project to fulfill the Memex vision first posited by Vannevar Bush in 1945. It is a system for storing all of one's digital media, including documents, images, sounds, and videos.
Article
Full-text available
The article focuses on the U.S.-based Home Media Networks Ltd. Home media acquisition, production, storage and use are on the cusp of a radical change as personal computer and network technologies integrate all media. Most current residences contain a jumbled mix of analog and digital equipment that will be replaced by all-digital, networked media...
Conference Paper
Full-text available
We present a technique for facial feature localization using a two-level hierarchical wavelet network. The first level wavelet network is used for face matching, and yields an affine transformation used for a rough approximation of feature locations. Second level wavelet networks for each feature are then used to fine-tune the feature locations. Co...
Article
Pragmatic General Multicast (PGM) is a reliable multicast transport protocol for applications that require ordered or unordered, duplicate-free, multicast data delivery from multiple sources to multiple receivers. PGM guarantees that a receiver in the group either receives all data packets from transmissions and repairs, or is able to detect unreco...
Article
Full-text available
WaveBase is a system for detecting features in a face image. It has a database of faces, each with a two-level hierarchical wavelet network. When a new face image is presented to the system for face detection, WaveBase searches its database for the "best face" -- the face whose first level wavelet network most closely matches the new face. It also...
Conference Paper
Full-text available
ABSTRACT Web - based viewing of short audio or video clips presents a dilemma: either the clip is streamed at low quality, or a high quality version is downloaded, forcing the user to wait It is difficult to convince users to wait for clips that they may discover to be uninteresting to them Progressive Layered Media (PLM) solves this dilemma by str...
Article
Full-text available
This document describes the Asynchronous Layered Coding (ALC) protocol, a massively scalable reliable content delivery protocol. Asynchronous Layered Coding combines the Layered Coding Transport (LCT) building block, a multiple rate congestion control building block and the Forward Error Correction (FEC) building block to provide congestion control...
Article
Full-text available
WaveBase is a system for detecting features in a face image. It has a database of faces, each with a two-level hierarchical wavelet network. When a new face image is presented to the system for face detection, WaveBase searches its database for the “best face ” – the face whose first level wavelet network most closely matches the new face. It also...
Article
Full-text available
Previous attempts at bringing gaze awareness to desktop videoconferencing have relied on hardware solutions. In this article, the authors describe their software approach, which tracks participants' head and eye movements using vision techniques, then uses this information to graphically place the head and eyes in a 3D environment
Article
Full-text available
Reliable data multicast is problematic. ACK/NACK schemes do not scale to large audiences, and simple data replication wastes network bandwidth. Fcast, “file multicasting”, combines multicast with forward error correction to address both these problems. Like classic multicast, Fcast scales to large audiences, and like other FEC schemes, it uses band...
Article
Full-text available
We present the warp tracker, a new system for tracking features through a sequence of images. Employing an automated multi-resolution lattice deformation technique, the warp tracker performs fairly well on its own, and its hierarchical nature naturally lends itself to being integrated with other tracking algorithms for increased accuracy and robust...
Article
Full-text available
"Push" technologies to large receiver sets often do not scale due to large amounts of data replication and limited network bandwidth. Even with improvements from multicast communication, scaling challenges persist. Diverse receiver capabilities still result in a high degree of resends. To combat this drawback, we combine multicast with Forward Erro...
Conference Paper
Full-text available
Reliable data multicast is difficult to scale. Fcast, "file multicasting", combines multicast with Forward Error Correction (FEC) to solve this problem. Like classic multicast, Fcast scales to large audiences, and like other FEC schemes, it uses bandwidth very efficiently. Some of the benefits of this combination were known previously, but Fcast co...
Article
Full-text available
We have developed a scalable reliable multicast architecture for delivering oneto -many telepresentations. Whereas the transport for interactive real-time audio and video is concerned with timely delivery, other media, such as slides, images and animations require reliability. We propose to support reliability by combining multicast with forward er...
Conference Paper
Full-text available
We have developed a scalable reliable multicast architecture for delivering one-to-many telepresentations. In contrast to audio and video, which are often transmitted unreliably other media, such as slides, images and animations require reliability. Our approach transmits the data in two layers. One layer is for session-persistent data, with reliab...
Conference Paper
Full-text available
There are many scenarios in which the same data must be delivered over a packet switched network to a large set of receivers. The Internet enables efficient multipoint transmissions through IP multicast by allowing data transmission to all receivers with a single send. Most approaches to scalable reliable multicast utilize receiver-oriented retrans...
Article
Full-text available
Reliable multicast schemes often cannot scale to large receiver sets due to the problems of state explosion and message implosion. In this paper we propose Erasure Correcting Scalable Reliable Multicast, ECSRM. ECSRM is based on the SRM framework proposed by Floyd et. al., which utilizes NACK suppression to reduce message implosion. ECSRM makes a n...
Article
Full-text available
The article discusses about the new technology that allows people to attend business conferences without physically attending the conferences. This is about telepresentations -- a presentation in which the presenter and/or some of the audience members are not physically present but are telepresent, that is, in a different location or at different t...
Article
Full-text available
The dream of the Information Superhighway is one in which audio (telephone), video (television), information (news, libraries, images) and data are combined in a single network, universally available and inexpensive. Crippled by low bandwidth, the Internet remains a crude prototype of the Information Superhighway. The telephone and cable TV industr...
Article
Full-text available
This paper establishes some fundamental principles for the retrieval and storage of delay-sensitive multimedia data. Delay-sensitive data include digital audio, animations, and video. Retrieval of these data types from secondary storage has to satisfy certain time constraints in order to be acceptable to the user. The presentation is based on digit...
Article
Full-text available
Many desktop videoconferencing systems are ineffective due to deficiencies in gaze awareness and sense of spatial relationship. Gaze awareness and spatial relationships can be restored by software if heads and eyes can be tracked in video, and then graphically manipulated. We discuss graphics algorithms for manipulating eye gaze and head orientatio...
Article
Full-text available
This paper describes two contributions to the analysis of MPEG video compression: the MPEG2Event library and the MPEGstats web site. MPEG2Event is a C# library intended to facilitate rapid prototyping of MPEG-2 analysis tools. Unlike other MPEG-2 decoding libraries that are designed for performance, MPEG2Event sacrifices parsing speed in order maxi...
Article
Reliable data multicast is problematic. ACK/NACK schemes do not scale to large audiences, and simple data replication wastes net work bandwidth. Fcast, “file multicasting”, combines multicast with Forward Error Correction (FEC) to address both these problems. Like classic multicast, Fcast scales to large audiences, and like other FEC schemes, it us...
Article
Full-text available
In this paper we describe the design of prototypes of histogram-based visualizations for browsing large, time- dependent collections of data. These visualizations are intended to provide an alternative to the standard hierarchical file browsing metaphor currently available, and are used to expose time-related information from the underlying dataset...
Article
Full-text available
As lifetime personal storage is becoming a reality, we find that it is becoming increasingly difficult to search and navigate the contents one accumulates. One of the most striking issues is the duplicates and near duplicates that clutter search and navigation. We investigated different technique to eliminate the duplicates and near duplicates obje...
Article
Full-text available
Location information can be used as an additional metadata to time in order to organize photos. Moreover, we believe location information derived from GPS tracks may be useful to look at and visualize for certain tasks. In this paper, we explore ways to organize images and location information, visualize them, and provide interactive operations suc...
Article
Many desktop videoconferencing systems are ineffective due to deficiencies in gaze awareness and sense of spatial relationship. Previous works employ special hardware to address these problems. Here, we describe a software-only approach. Heads and eyes in the video are tracked using computer-vision techniques, and the tracking information is transm...

Network

Cited By