John Smith

John Smith
University of California, Davis | UCD

PhD in High Energy Physics

About

232
Publications
16,611
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
8,990
Citations

Publications

Publications (232)
Preprint
Face recognition is a long standing challenge in the field of Artificial Intelligence (AI). The goal is to create systems that accurately detect, recognize, verify, and understand human faces. There are significant technical hurdles in making these systems accurate, particularly in unconstrained settings due to confounding factors related to pose,...
Article
The production of sports highlight packages summarizing a game's most exciting moments is an essential task for broadcast media. Yet, it requires labor intensive video editing. We propose a novel approach for auto-curating sports highlights, and demonstrate it to create a real-world system for the editorial aid of golf and tennis highlight reels. O...
Conference Paper
We introduce a novel multi-modal system for auto-curating golf highlights that fuses information from players' reactions (celebration actions), spectators (crowd cheering), and commentator (tone of the voice and word analysis) to determine the most interesting moments of a game. The start of a highlight is determined with additional metadata (playe...
Conference Paper
Full-text available
In this paper, we describe the first-ever machine human collaboration at creating a real movie trailer (officially released by 20th Century Fox). We introduce an intelligent system designed to understand and encode patterns and types of emotions in horror movies that are useful in trailers. We perform multi-modal semantics extraction including audi...
Article
Full-text available
The production of sports highlight packages summarizing a game's most exciting moments is an essential task for broadcast media. Yet, it requires labor-intensive video editing. We propose a novel approach for auto-curating sports highlights, and use it to create a real-world system for the editorial aid of golf highlight reels. Our method fuses inf...
Conference Paper
Full-text available
We present a system to assist users in dietary logging habits, which performs food recognition from pictures snapped on their phone in two different scenarios. In the first scenario, called "Food in context", we exploit the GPS information of a user to determine which restaurant they are having a meal at, therefore restricting the categories to rec...
Conference Paper
We propose a visual food recognition framework that integrates the inherent semantic relationships among fine-grained classes. Our method learns semantics-aware features by formulating a multi-task loss function on top of a convolutional neural network (CNN) architecture. It then refines the CNN predictions using a random walk based smoothing proce...
Conference Paper
Full-text available
The task of associating images and videos with a natural language description has attracted a great amount of attention recently. The state-of-the-art results on some of the standard datasets have been pushed into the regime where it has become more and more difficult to make significant improvements. Instead of proposing new models, this work inve...
Article
The task of associating images and videos with a natural language description has attracted a great amount of attention recently. Rapid progress has been made in terms of both developing novel algorithms and releasing new datasets. Indeed, the state-of-the-art results on some of the standard datasets have been pushed into the regime where it has be...
Conference Paper
This paper proposes to leverage multiple facets of person photos to improve the training of deep neural networks. Existing studies usually require a lot of labeled images to train deep convolutional networks. Our study suggests exploring multiple datasets and learning effective representation to learn related visual concepts. The practice of learni...
Patent
Full-text available
Interoperability is enabled between participants in a network by determining values associated with a value metric defined for at least a portion of the network. Information flow is directed between two or more of the participants based at least in part on semantic models corresponding to the participants and on the values associated with the value...
Patent
Full-text available
A system is provided which solves content acquisition issues by providing an automated method to acquire content in mass and maintain an association between available meta-data and the actual content, e.g., video file. The system includes a first component configured to log network traffic. The system also includes a second component configured to...
Patent
Full-text available
A system and method for analyzing video include segmenting video stored in computer readable storage media into keyframes. Near-duplicate keyframes are represented as a sequence of indices. The near-duplicate keyframes are rendered in a graphical representation to determine relationships between video content.
Conference Paper
Outliers are pervasive in many computer vision and pattern recognition problems. Automatically eliminating outliers scattering among practical data collections becomes increasingly important, especially for Internet inspired vision applications. In this paper, we propose a novel one-class learning approach which is robust to contamination of input...
Article
Visual scenes require complex description and modeling that involves more than a list of words. The visual semantic concept basis is larger than the number of unique words. More efforts are needed to build out the set of visual semantic concepts.
Conference Paper
Visual data is exploding! 500 billion consumer photos are taken each year world-wide, 633 million photos taken per year in NYC alone. 120 new video-hours are uploaded on YouTube per minute. The explosion of digital multimedia data is creating a valuable open source for insights. However, the unconstrained nature of 'image/video in the wild' makes i...
Article
The explosion of geo-tagged images taken from mobile devices around the world is visually capturing life at amazingly high spatial-, temporal-, and semantic-density. In places like cities, which cover only 3% of the Earth's landmass, yet account for 50% of the world's population, the density of photos averages one photo per every 18 square meters p...
Article
Video search needs effective and efficient techniques for video summarization to enable rapid triage and finding relevant video contents.
Conference Paper
In this talk we present a perspective across multiple industry problems, including safety and security, medical, Web, social and mobile media, and motivate the need for large-scale analysis and retrieval of multimedia data. We describe a multi-layer architecture that incorporates capabilities for audio-visual feature extraction, machine learning an...
Article
Machine learning has become an indispensible tool for the multimedia community. Given large amounts of data, computers using machine learning are able to create rich representations and accomplish impressive discrimination tasks. Yet, the way machines learn is still differs significantly from how humans learn. EIC John R. Smith explains that the wa...
Conference Paper
This paper considers the person verification problem in modern surveillance and video retrieval systems. The problem is to identify whether a pair of face or human body images is about the same person, even if the person is not seen before. Traditional methods usually look for a distance (or similarity) measure between images (e.g., by metric learn...
Conference Paper
Attribute-based representation has shown great promises for visual recognition due to its intuitive interpretation and cross-category generalization property. However, human efforts are usually involved in the attribute designing process, making the representation costly to obtain. In this paper, we propose a novel formulation to automatically desi...
Article
Growing multicamera, multiperspective image capture provides the opportunity to reconstruct and recount real-world events at a fine resolution.
Patent
Full-text available
Systems and methods for describing video content establish video description records which include an object set (24), an object hierarchy (26) and entity relation graphs (28). Video objects can include global objects, segment objects and local objects. The video objects are further defined by a number of features organized in classes, which in tur...
Article
Classification schemes are needed to catalog the growing numbers of images. Facets are an important technique from library science that can help to more effectively represent visual concepts.
Conference Paper
Real-world videos often contain dynamic backgrounds and evolving people activities, especially for those web videos generated by users in unconstrained scenarios. This paper proposes a new visual representation, namely scene aligned pooling, for the task of event recognition in complex videos. Based on the observation that a video clip is often com...
Article
Full-text available
Social information networks, such as YouTube, contains traces of both explicit online interaction (such as "like", leaving a comment, or subscribing to video feed), and latent interactions (such as quoting, or remixing parts of a video). We propose visual memes, or frequently re-posted short video segments, for tracking such latent video interactio...
Article
Multimedia research has taken on many technical problems over the last decade. Problems such as video on demand and face recognition receive less focus today, while others like content-based retrieval and social media are gaining focus. Different factors can help explain the shifting focus in multimedia research.
Article
Authoring of rich media content is not prevalent despite efforts to develop standards, tools, and platforms. Average users prefer to keep it simple. However, growing interest in stylizing content and pinning media objects is putting average users on a new path of creativity that could lead to richer multimedia content.
Conference Paper
People verification is a challenging and important task which finds many applications in modern surveillance and video retrieval systems. In this problem, metric learning approaches have played an important role by trying to bridge the semantic gap between image features and people's identities. However, we believe that the traditional Mahalanobis...
Article
This paper describes a system to estimate geographical locations for beach photos. We develop an iterative method that not only trains visual classifiers but also discovers geographical clusters for beach regions. The results show that it is possible to recognize different beaches using visual information with reasonable accuracy, and our system wo...
Article
Social media provides new opportunities for sharing health-related data online. Although crowdsourcing medical diagnoses is not yet the trend, people are using social media to seek answers and better understand treatments and outcomes as doctors, experts, and patients converge online.
Conference Paper
The explosion of images, video and multimedia is creating a valuable source for insights. It can tell us about things happening in the world, give clues about a person's preferences or experiences, indicate places of interest in a new town, and even capture a rolling log of our history. But, as a non-traditional source for data mining, there are nu...
Article
"Bridging the semantic gap" is an expression often used to describe work on multimedia content understanding. At best, research today is bridging a semantic gap, of which there are many. Better characterizing the overall size and shape of the semantic space for multimedia will help define what is on the other side and ensure that we make progress o...
Conference Paper
Full-text available
We propose visual memes, or frequently reposted short video segments, for tracking large-scale video remix in social media. Visual memes are extracted by novel and highly scalable detection algorithms that we develop, with over 96% precision and 80% recall. We monitor real-world events on YouTube, and we model interactions using a graph model over...
Article
The number and density of digital photos taken in cities is making it possible to automatically contextualize the photos and capture the continuous history across places, people, events, and objects of interest.
Article
The shift to consumption of news in digital form, as more users access information with mobile and portable devices, is driving a new premium for timeliness and immediacy of news reporting. Increasingly computer-based systems will be able to continuously monitor the pulse of the world through digital means, detecting topics of interest from open, m...
Article
Every machine loves a good challenge. At least that’s what we must think given the number of competitions we’ve created for them. Although the nature of the classic manmachine contests has changed since the seminal challenge of human pile driver versus steam-powered drill in creating the US railroads in the 1800s, speed, strength, and cunning are s...
Conference Paper
Full-text available
We explore in a single but large case study how videos within YouTube, competing for view counts, are like organisms within an ecology, competing for survival. We develop this analogy, whose core idea shows that short video clips, best detected across videos as near-duplicate keyframes, behave similarly to genes. We report work in progress, on a da...
Conference Paper
Full-text available
This paper presents probabilistic visual concept trees, a model for large visual semantic taxonomy structures and its use in visual concept detection. Organizing visual semantic knowledge systematically is one of the key challenges towards large-scale concept detection, and one that is complementary to optimizing visual classification for individua...
Article
Physical objects are being linked to the digital world using multimedia technologies like audiovisual content recognition and large-scale multimedia content-based search.
Conference Paper
We consider the end-to-end system design and evaluation of an efficient and effective system for video copy detection that bridges the gap between computationally expensive methods and practical applications. We use a compact SIFT-based bag-of-words fingerprint (which we call a SIFTogram), requiring only 1000 bytes per second of video, and show tha...
Article
Digital photo collections contain a wealth of information that can provide tremendous insights. Photo metadata in the form of geotags and timestamps coupled with extracted descriptors about people and other semantic content allows photo collections to answer many questions about our lives.
Article
DARPA's network challenge to find ten red balloons is inspiring for the multimedia community to imagine the possibilities of automated image searching with real-world scene matching and geolocation capability at a massive scale.
Conference Paper
Automated image tagging is a problem of great interest, due to the proliferation of photo sharing services. Researchers have achieved considerable advances in understanding motivations and usage of tags, recognizing relevant tags from image content, and leveraging community input to recommend more tags. In this work we address several important iss...
Article
Applications running on multicore platforms are difficult to program, and even more difficult to optimize, mainly due to (1) the several layers where the optimizations occur and (2) the multitude of available resources to be exploited in parallel. Although low-level optimizations only target code running on individual cores, high-level optimization...
Conference Paper
This workshop, as the first of its kind, aims to bring together researchers and industrial practitioners interested in large-scale multimedia data retrieval and mining. The workshop will provide a venue for the participants to explore a variety of aspects and applications on how advanced multimedia analysis techniques can be leveraged to address th...
Article
With the rapid growth of multimedia data, it becomes increasingly important to develop semantic concept modeling approaches that are consistently effective, highly efficient, and easily scalable. To this end, we first propose the robust subspace bagging (RB-SBag) algorithm by augmenting random subspace bagging with forward model selection. Compared...
Article
The Cygnus Dual Beam Radiographic Facility consists of two 2.25-MV 60-kA 50-ns X-ray sources fielded in an underground laboratory at the Nevada Test Site. The tests performed in this laboratory involve the study of the dynamic properties of plutonium and are called subcritical experiments. From end to end, the Cygnus machines utilize the following...
Article
The Cygnus Dual Beam Radiographic Facility consists of two radiographic sources (Cygnus 1 and 2), each with a dose rating of 4 rd at 1 m, and a 1-mm-diameter spot size. The electrical specifications are the following: 2.25 MV, 60 kA, and 60 ns. This facility is located in an underground environment at the Nevada Test Site (NTS). These sources were...
Article
In the ever growing digital media in the Internet, search engines are now considered inadequate to find relevant data to a query by a user. Even those search engines that crawl broadly across the Web don't adequately characterize the digital content to make way for effective search. Despite high accuracy in some cases, digital-content search always...
Conference Paper
Full-text available
In this paper, we describe the IBM Research system for indexing, analysis, and copy detection of video as applied to the TRECVID-2009 video retrieval benchmark. A. High-Level Concept Detection: This year, our focus was on global and local feature combination, automatic training data construction from web domain, and large-scale detection using Hado...
Conference Paper
IBM Multimedia Analysis and Retrieval System is a Web-based technology that makes digital photos and video searchable through automated classification and indexing.
Conference Paper
Full-text available
In this paper we present a solution for efficient porting of sequential C++ applications on the Cell B.E. processor. We present our step-by-step approach, focusing on its generality, we provide a set of code templates and optimization guidelines to support the porting, and we include a set of equations to estimate the performance gain of the new ap...
Conference Paper
Full-text available
In this paper we examine a novel approach to the difficult problem of querying video databases using visual topics with few examples. Typically with visual topics, the examples are not sufficiently diverse to create a robust model of the user's need. As a result, direct modeling using the provided topic examples as training data is inadequate. Othe...
Conference Paper
The Cygnus Dual Beam Radiographic Facility consists of two identical radiographic sources: Cygnus 1 and Cygnus 2. Each source has the following X-ray output: 1-mm diameter spot size, 4 rads at 1 m, 50-ns full-width-half-maximum. The diode pulse has the following electrical specifications: 2.25 MV, 60 kA, 60 ns. This Radiographic Facility is located...
Conference Paper
The Cygnus Dual Beam Radiographic Facility consists of two identical radiographic sources with the following specifications: 4-rad dose at 1 m, 1-mm spot size, 50-ns pulse length, 2.25-MeV endpoint energy. The facility is located in an underground tunnel complex at the Nevada Test Site. Here SubCritical Experiments (SCEs) are performed to study the...
Conference Paper
The Cygnus Dual Beam Radiographic Facility consists of two radiographic sources (Cygnus 1, Cygnus 2) each with a dose rating of 4 rads at 1 m, and a 1-mm diameter spot size. The electrical specifications are: 2.25 MV, 60 kA, 60 ns. This facility is located in an underground environment at the Nevada Test Site (NTS). These sources were developed as...
Conference Paper
Summary form only given. The Cygnus Dual Beam Radiographic Facility consists of two 2.25-MV, 60-kA, 50-ns X-ray sources fielded in an underground laboratory at the Nevada Test Site. The tests performed in this laboratory invoke study of the dynamic properties of plutonium and are called subcritical experiments, from end-to-end. the Cygnus machines...
Conference Paper
In this demo we present a novel approach for (a) automatic labeling and grouping of multimedia content using existing metadata and semantic concepts, and (b) interactive context driven tagging of clusters of multimedia content. Proposed system leverages existing metadata info in conjunction with automatically assigned semantic descriptors. One of t...
Conference Paper
In this paper we present a novel approach to query-by-example using existing high-level semantics in the dataset. Typically with visual topics, the examples are not sufficiently diverse to create robust model of the user's need in the descriptor's space. As a result, direct modeling using the provided topic examples as training data is inadequate....
Conference Paper
IBM Multimedia Search and Retrieval System is a Webbased technology that makes digital photos and video searchable through automated classification and indexing [3]. IBM system is unique in that it learns as it goes, helping users search immense multimedia content repositories faster and more effectively than ever before. Marvel uses multi-modal ma...
Conference Paper
Full-text available
We present a case study of developing a digital media indexing application, code-named MARVEL, on the STI cell broadband engine (CBE) processor. There are two aspects of the target application that require significant computing power: image analysis for feature extraction, and support vector machine (SVM) based pattern classification for concept de...
Conference Paper
Full-text available
In this paper, we describe the IBM Research system for indexing, analysis, and retrieval of video as applied to the TREC-2007 video retrieval benchmark. This year, focus of the system improvement was on cross-domain learning, automation, scalability, and interactive search. Keywords—Multimedia indexing, content-based re- trieval, Support Vector Mac...
Conference Paper
The “semantic gap” is a well-know problem in multimedia. The challenge is to accurately classify and effectively search multimedia content from automatically extracted low-level audio-visual features. While much effort is focused on developing the best machine learning approaches, not enough attention is placed on the required semantic coverage and...
Conference Paper
Full-text available
Typical approaches to the multi-label classification problem require learning an independent classifier for every label from all the examples and features. This can become a computational bottleneck for sizeable datasets with a large label space. In this paper, we propose an efficient and effective multi-label learning algorithm called model-shared...
Conference Paper
We present novel algorithms for detecting generic visual events from video. Target event models will produce binary decisions on each shot about classes of events involving object actions and their interactions with the scene, such as airplane taking off, exiting car, riot. While event detection has been studied in scenarios with strong scene and i...
Chapter
The explosion in multimodal content availability underlines the necessity for content management at a semantic level. We have cast the problem of detecting semantics in multimedia content as a pattern classification problem and the problem of building models of multimodal semantics as a learning problem. Recent trends show increasing use of statist...
Conference Paper
New digital multimedia content is being generated at a tremendous rate. At the same time, the growing variety of distributions channels, e.g., Web, wireless/mobile, cable, IPTV, satellite, is increasing users’ expectations for accessibility and searchability of digital multimedia content. However, users are still finding it difficult to find releva...
Conference Paper
We present a system for visualizing event detection in video and revealing the algorithmic and scientific insights. Visual events are viewed as evolving temporal patterns in the semantic concept space. For video clips of different events, we present their corresponding traces in the semantic concept space as the event evolves. The presentation of t...
Conference Paper
In this demo we present a novel approach for labeling clusters in minimally annotated data archives. We propose to build on clustering by aggregating the automatically tagged semantics. We propose and compare four techniques for labeling the clusters and evaluate the performance compared to human labeled ground-truth. We define the error measures t...
Conference Paper
Full-text available
In this paper we present a novel approach for labeling clusters of multimedia content that leverages supervised classification techniques in conjunction with unsupervised clustering. Re- cent research has produced significant results for automatic tagging of video content such as broadcast news. For exam- ple, powerful techniques have been demonstr...
Conference Paper
Full-text available
A novel framework is introduced for visual event detection. Visual events are viewed as stochastic temporal processes in the semantic concept space. In this concept-centered approach to visual event modeling, the dynamic pattern of an event is modeled through the collective evolution patterns of the in- dividual semantic concepts in the course of t...
Book
This volume contains the proceeding of the 5th International Conference on - age and Video Retrieval (CIVR), July 13–15, 2006, Arizona State University, Tempe, AZ, USA: http://www. civr2006. org. Image and video retrieval cont- ues to be one of the most exciting and fast-growing research areas in the ?eld of multimedia technology. However, opportun...
Chapter
IntroductionModelling Concepts: Support Vector Machines for Multiject ModelsModelling Context: A Graphical Multinet Model for Learning and Enforcing ContextExperimental Set-up and ResultsConcluding RemarksAcknowledgementReferences
Conference Paper
The popularity of digital media (images, video, audio) is growing in all segments of the market including consumer, media enterprise, traditional enterprise and Web. Its tremendous growth is a result of the convergence of many factors, including the pervasive increase in bandwidth to users, general affordability of multimedia-ready devices througho...
Article
Face recognition with variant pose, illumination and expression (PIE) is a challenging problem. In this paper, we propose an analysis-by-synthesis framework for face recognition with variant PIE. First, an efficient two-dimensional (2D)-to-three-dimensional ...
Conference Paper
Annotated collections of images and videos are a necessary basis for the successful development of multimedia retrieval systems. The underlying models of such systems rely heavily on quality and availability of large training collections. The annotation of large collections, however, is a time-consuming and error prone task as it has to be performe...
Conference Paper
In the future, the television set at home will be a key device for providing integrated information services through broadcasting, communication and storage media. In this environment, users will be able to receive any type of information, e. g., HDTV ...