Anthony Vetro

Anthony Vetro
Mitsubishi Electric Research Laboratories · Multimedia Group

PhD in Electrical Engineering

About

301
Publications
33,353
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
7,698
Citations
Additional affiliations

Publications

Publications (301)
Article
This chapter discusses the application of distributed source coding techniques to biometric secu-rity. A Slepian-Wolf coding system is used to provide a secure means of storing biometric data that provides robust biometric authentication for genuine users and guards against attacks from imposters. A formal quantification of the trade off between se...
Article
Full-text available
Significant improvements in video compression capability have been demonstrated with the introduction of the H.264/MPEG-4 advanced video coding (AVC) standard. Since developing this standard, the Joint Video Team of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) has also standardized an extension of...
Article
Throughout this article, we concentrate on the transcoding of block-based video coding schemes that use hybrid discrete cosine transform (DCT) and motion compensation (MC). In such schemes, the frames of the video sequence are divided into macroblocks (MBs), where each MB typically consists of a luminance block (e.g., of size 16 × 16, or alternativ...
Article
Full-text available
Geometric data acquired from real-world scenes, e.g., 2D depth images, 3D point clouds, and 4D dynamic point clouds, have found a wide range of applications including immersive telepresence, autonomous driving, surveillance, etc. Due to irregular sampling patterns of most geometric data, traditional image/video processing methodologies are limited,...
Preprint
Full-text available
Geometric data acquired from real-world scenes, e.g., 2D depth images, 3D point clouds, and 4D dynamic point clouds, have found a wide range of applications including immersive telepresence, autonomous driving, surveillance, etc. Due to irregular sampling patterns of most geometric data, traditional image/video processing methodologies are limited,...
Preprint
Full-text available
Many real-world applications involve multivariate, geo-tagged time series data: at each location, multiple sensors record corresponding measurements. For example, air quality monitoring system records PM2.5, CO, etc. The resulting time-series data often has missing values due to device outages or communication errors. In order to impute the missing...
Chapter
We aim to tackle a novel task in action detection - Online Detection of Action Start (ODAS) in untrimmed, streaming videos. The goal of ODAS is to detect the start of an action instance, with high categorization accuracy and low detection latency. ODAS is important in many applications such as early alert generation to allow timely security or emer...
Article
The goal of Online Action Detection (OAD) is to detect action in a timely manner and to recognize its action category. Early works focused on early action detection, which is effectively formulated as a classification problem instead of online detection in streaming videos, because these works used partially seen short video clip that begins at the...
Patent
Full-text available
A method processes a signal by first constructing a graph from the signal, and then determining a graph matrix from the graph and the signal. A Krylov-based subspace is determined based on the graph matrix and the signal. A filter for the Krylov subspace is determined. The filter transforms the signal to produce a filtered signal, which is output.
Article
Full-text available
To reduce cost in storing, processing and visualizing a large-scale point cloud, we consider a randomized resampling strategy to select a representative subset of points while preserving application-dependent features. The proposed strategy is based on graphs, which can represent underlying surfaces and lend themselves well to efficient computation...
Conference Paper
In contrast to still image analysis, motion information offers a powerful means to analyze video. In particular, motion trajectories determined from keypoints have become very popular in recent years for a variety of video analysis tasks, including search, retrieval and classification. Additionally, cloud-based analysis of media content has been ga...
Article
Full-text available
This issue features not just one but two conference reports. The first covers the 14th International Workshop on Content-Based Multimedia Indexing (CBMI 16), while the second covers the 2016 IEEE International Conference on Multimedia and Expo (ICME 2016). For both, find out what hot topics and key themes were discussed, which submissions earned th...
Conference Paper
We propose an online blind deconvolution approach to sequential through-the-wall-radar-imaging (TWI) where the received signal is contaminated by front wall ringing artifacts. The sequential measurements correspond to individual transmitter-receiver pairs where the front wall ringing induces a multipath kernel that corrupts the received target refl...
Article
Full-text available
Visual retrieval and classification are of growing importance for a number of applications, including surveillance, automotive, as well as web and mobile search. To facilitate these processes, features are often computed from images to extract discriminative aspects of the scene, such as structure, texture or color information. Ideally, these featu...
Article
Full-text available
In 3D image/video acquisition, different views are often captured with varying noise levels across the views. In this paper, we propose a graph-based image enhancement technique that uses a higher quality view to enhance a degraded view. A depth map is utilized as auxiliary information to match the perspectives of the two views. Our method performs...
Article
This article provides an overview of standardized extensions of HEVC for the advanced coding of 3D and multiview video. In those extensions, new coding tools that better exploit the inter-view redundancy of the multiview texture videos have been developed. Additionally, dedicated tools for the improved coding of depth have been extensively studied...
Patent
Full-text available
In a decoder, a desired image is estimated by first retrieving coding modes from an encoded side information image. For each bitplane in the encoded side information image, syndrome bits or parity bits are decoded to obtain an estimated bitplane of quantized transform coefficients of the desired image. A quantization and a transform are applied to...
Patent
An image for a virtual view of a scene is generated based on a set of texture images and a corresponding set of depth images acquired of the scene. A set of candidate depths associated with each pixel of a selected image is determined. For each candidate depth, a cost that estimates a synthesis quality of the virtual image is determined. The candid...
Article
Full-text available
Many of the existing video coding standards in use today were developed primarily using camera-captured content as test material. Today, with the more widespread use of connected devices, there is an increased interest in developing video coding tools that target screen content video. Screen content video is often characterized by having sharp edge...
Article
In stereo video applications, the quality of the two views may vary based on different camera capturing conditions and setup, compression/transmission, and sensor noise. Although some studies show that the perceived video quality may not be significantly affected by the lower quality view, maintaining a similar video quality is still desired in ord...
Article
We consider the problem of extracting descriptors that represent visually salient portions of a video sequence. Most state-of-the-art schemes generate video descriptors by extracting features, e.g., SIFT or SURF or other keypoint-based features, from individual video frames. This approach is wasteful in scenarios that impose constraints on storage,...
Article
Full-text available
The High Efficiency Video Coding standard has recently been extended to support efficient representation of mul-tiview video and depth-based 3D video formats. The multiview extension, MV-HEVC, allows efficient coding of multiple camera views and associated auxiliary pictures, and can be implemented by reusing single-layer decoders without changing...
Patent
Full-text available
A disparity vector for a pixel in a right image corresponding to a pixel in a left image in a pair of stereo images is determined. The disparity vector is based on a horizontal disparity and a vertical disparity and the pair of stereo images is unrectified. First, a set of candidate horizontal disparities is determined. For each candidate horizonta...
Article
Full-text available
Advanced multiview video systems are able to generate intermediate viewpoints of a 3-D scene. To enable low-complexity free view generation, texture and its associated depth are used as input data for each viewpoint. To improve the coding efficiency of such content, view synthesis prediction (VSP) is proposed to further reduce interview redundancy...
Patent
Full-text available
A system and a method for decoding at least a portion of an image includes determining a current prediction mode based on a combination of a prediction mode residue and a function of at least one previous prediction mode and decoding the portion of the image using the current prediction mode.
Patent
Full-text available
A bitstream includes coded pictures, and split-flags for generating a transform tree. The bit stream is a partitioning of coding units (CUs) into Prediction Units (PUs). The transform tree is generated according to the split-flags. Nodes in the transform tree represent transform units (TU) associated with the CUs. The generation splits each TU only...
Conference Paper
Full-text available
Depth images are often presented at a lower spatial resolution, either due to limitations in the acquisition of the depth or to increase compression efficiency. As a result, upsampling low-resolution depth images to a higher spatial resolution is typically required prior to depth image based rendering. In this paper, depth enhancement and up-sampli...
Conference Paper
We propose a factorized robust matrix completion (FRMC) algorithm with global motion compensation to solve the video background subtraction problem. The algorithm decomposes a sequence of video frames into the sum of a low rank background component and a sparse motion component. The algorithm alternates between the solution of each component follow...
Patent
Full-text available
A quality of a virtual image for a synthetic viewpoint in a 3D scene is determined. The 3D scene is acquired by texture images, and each texture image is associated with a depth image acquired by a camera arranged at a real viewpoint. A texture noise power is based on the acquired texture images and reconstructed texture images corresponding to a v...
Article
This article reviews the most recent extensions to the Advanced Video Coding (AVC) and High Efficiency Video Coding (HEVC) coding standards, which integrate depth video to support advanced multiview and 3D video functionalities. All the extensions provide single-view compatibility, while some extensions add depth support on top of conforming stereo...
Article
This paper describes research and results in which a visual acuity (VA) model of the human visual system (HVS) is used to reduce the bitrate of coded video sequences, by eliminating the need to signal transform coefficients when their corresponding frequencies will not be detected by the HVS. The VA model is integrated into the state of the art HEV...
Patent
Embodiments of the invention disclose a system and a method for transforming a biometric of a user to a binary feature vector suitable for user authentication, comprising steps of: partitioning the biometric into a set of regions, wherein each region is a contiguous region confining a part of the biometric; determining, for each region, biometric p...
Patent
Full-text available
Embodiments of the invention disclose a system and a method for embedding a symbol in a glyph, comprising the steps of determining a set of landmarks representing an outline of the glyph; determining a data segment between two landmarks, wherein the data segment is suitable for embedding the symbol; modifying the data segment according to the symbo...
Patent
Full-text available
Embodiments disclose a method and a system for determining securely the Manhattan distance between a first and a second signal. The system is mapping the first signal to a first binary signal; mapping the second signal to a second binary signal, such that the squared distance between the first signal and the second binary signals equals the Manhatt...
Article
Full-text available
This paper describes extensions to the High Efficiency Video Coding (HEVC) standard that are active areas of current development in the relevant international standardization committees. While the first version of HEVC is sufficient to cover a wide range of applications, needs for enhancing the standard in several ways have been identified, includi...
Article
Full-text available
We propose an analytical model to estimate the synthesized view quality in 3D video. The model relates errors in the depth images to the synthesis quality, taking into account texture image characteristics, texture image quality and the rendering process. Specifically, we decompose the synthesis distortion into texture-error induced distortion and...
Conference Paper
Full-text available
In this work, a new coding tool called Edge Mode is proposed for HEVC intra coding, aimed at improving coding efficiency for screen content video. A set of edge modes that correspond to edge positions are identified based upon intra prediction directions. Then, a simplified scheme is developed to select the best edge mode. To avoid applying a trans...
Conference Paper
Modern, state-of-the-art disparity estimation techniques are able to very accurately estimate the disparity for a wide variety of scene types. However all of these methods assume that the input images are epipolar rectified. When an image pair is not rectified, it must be pre-processed before any estimation can be done. In this paper we propose a d...
Conference Paper
Full-text available
Depth-based 3D formats are currently being developed as extensions to both AVC and HEVC standards. The availability of depth information facilitates the generation of intermediate views for advanced 3D applications and displays, and also enables more efficient coding of the multiview input data through view synthesis prediction techniques. This pap...
Conference Paper
We propose a rate-efficient, feature-agnostic approach for encoding image features for cloud-based nearest neighbor search. We extract quantized random projections of the image features under consideration, transmit these to the cloud server, and perform matching in the space of the quantized projections. The advantage of this approach is that, onc...
Article
Full-text available
A new set of three-dimensional (3D) data formats and associated compression technologies are emerging with the aim to achieve more flexible representation and higher compression of 3D and multiview video content. These new tools will facilitate the generation of multiview output (e.g., as needed for multiview auto-stereoscopic displays), provide ri...
Conference Paper
We propose an analytical model to estimate the synthesized view quality in 3D video. Specifically, we estimate the depth-error induced distortion using an approach that combines frequency and spatial domain analysis. We also propose to decompose the spatial-variant video signals into gradient-based representations to capture the interaction between...
Conference Paper
View synthesis prediction provides an effective way to reduce inter-view redundancy of multiview video in addition to conventional disparity compensated prediction. Traditional forward warping techniques incur high complexity since an entire picture is typically warped from one viewpoint to another. To reduce this complexity, block-based backward w...
Conference Paper
Advanced multiview video systems are able to generate intermediate viewpoints of a 3D scene. In addition to the texture content, corresponding depth is associated with each viewpoint. To improve the coding efficiency of such content, view synthesis prediction can be used to further reduce inter-view redundancy in addition to traditional disparity c...
Patent
Full-text available
A method and system manage a hierarchy of passwords for users accessing a hierarchy of access control devices. First, a codeword is acquired and a syndrome of the codeword is determined. Next, the codeword is randomly modified with a probability p to produce a modified codeword. The modified codeword is selected and assigned to a user as a password...
Conference Paper
Traditional multi-view coding (MVC) systems compress the texture content captured from different view points, where temporal and inter-view redundancy are exploited to improve MVC coding efficiency. The advanced 3D video coding systems compress both the texture content and its corresponding depth captured from different view points, known as multiv...
Patent
Full-text available
A method for authenticating biometric data. Comprising of a processor that measures the reliability of each bit in enrollment biometric data; by arranging the bits; encoding the enrollment biometric data in the decreasing order to produce an enrollment syndrome; arranging the bits in the authentication biometric; decoding the authentication enrollm...
Patent
Full-text available
A method embeds a message into a document containing a set of glyphs. Individual glyphs in the document, groups of glyphs in the document, or the entire document are represented using a distance field that includes distance values from the shapes of interest. Each symbol of the message is represented as modifications of a subset of the distance val...
Patent
Full-text available
A method for encoding a source image, wherein the source image includes a set of bitplanes of pixels, is disclosed. For each bitplane in a most to least significant order, the method include obtaining a list of significant pixels (LSP), a list of insignificant pixels (LIP), and a list of insignificant sets (LIS) according to a hierarchical ordering...
Article
This chapter provides an overview of the current status of research and standardization activity towards defining a new set of depth-based formats that facilitate the generation of intermediate views with a compact binary representation. A brief introduction is given on depth-based representation and rendering techniques, which are the basis for th...
Patent
Full-text available
A system and a method for determining a result of applying a function to signals is disclosed. The function is a polynomial function including monomials, in which the first signal in a first power forming a first part of the monomial and the second signal in a second power forming a second part of the monomial, wherein the first part of the monomia...
Patent
Full-text available
A method synthesizes virtual images from a sequence of texture images and a sequence of corresponding depth images, wherein each depth images stores depths d at pixel locations I(x, y). Each depth image, is preprocessed to produce a corresponding preprocessed depth image. A first reference image and a second reference image are from the sequence of...
Patent
Full-text available
A virtual image is synthesized from a reduced resolution depth image storing depth values at each pixel location. The reduced resolution depth image is scaled up to produce an up-scaled depth image. Then, at least one filter is applied to the up-scaled depth image to produce a reconstructed depth image, and the virtual image is synthesized using th...
Patent
Full-text available
Biometric parameters acquired from human faces, voices, fingerprints, and irises are used for user authentication and access control. Because the biometric parameters are continuous and vary from one reading to the next, syndrome codes are applied to determine biometric syndrome vectors. The biometric syndrome vectors can be stored securely, while...
Article
Full-text available
Depth map images are characterized by large homogeneous areas and strong edges. It has been observed that efficient compression of the depth map is achieved by applying a down-sampling operation prior to encoding. However, since high resolution depth maps are also needed for depth-based 3D coding tools, such as view synthesis prediction, an up-samp...
Article
Standardization of a new set of 3D formats has been initiated with the goal of improving the coding of stereo and multiview video, and also facilitating the generation of multiview output needed for auto-stereoscopic displays. Part of this effort will develop 3D and multiview extensions of the emerging standard for High Efficiency Video Coding (HEV...
Conference Paper
We propose an analytical model to estimate the rendering quality in 3D video. The model relates errors in the depth images to the rendering quality, taking into account texture image characteristics, texture image quality, the camera configuration and the rendering process. Specifically, we derive position (disparity) errors from the depth errors,...
Conference Paper
Full-text available
The quality of the depth map is crucial for depth image based rendering (DIBR) which enables a variety of advanced 3D video related applications such as perceived depth adjustment for stereoscopic video and intermediate view generation for multiview auto-stereoscopic displays. However, the input depth map for DIBR may suffer from errors and noise,...
Article
The High Efficiency Video Coding (HEVC) standardization process currently underway includes many tools for the coding of intra pictures. HEVC allows for many more intra prediction modes or directions as compared to previous standards. Efficient coding of these modes is therefore important because the modes consume a non-negligible portion of the to...