DataPDF Available

Slides PhD Thesis Defense

Authors:

Abstract

Slides PhD Thesis Defense PhD Thesis Title: Content-Based Video Copy Detection PhD Thesis DOI: 10.13140/2.1.1766.9125
Content-Based Video Copy Detection
Juan Manuel Barrios
Department of Computer Science
University of Chile
PhD Thesis Defense
Santiago, November 26th, 2013
2
Outline
1.
Introduction
2.
Related Work
3.
General Overview of our approach
4.
Review of main contributions
Regarding Effectiveness
Regarding Efficiency
5.
Comparison and evaluation
6.
Conclusions and future work
3
Video Copy Detection (VCD) consists in locating the
videos that are copies of some original video.
Many definitions of “copy”.
In this work, a copy is a derivative work produced by the
application of one or more “content transformations”.
Content-Based Video Copy Detection (CBVCD) is the
approach of locating copies relying exclusively on the
audiovisual content.
Introduction
4
Reference collection, “R”:Set of known videos.
Introduction
5
Introduction
Query collection, “Q:One or more unknown videos.
E.g: youtube videos of compilations, selected scenes, etc.
Problem: Given Qand R, determine for each video in Q
which reference videos from Rare visible.
6
Outline
1.
Introduction
2.
Related Work
3.
General Overview of our approach
4.
Review of main contributions
Regarding Effectiveness
Regarding Efficiency
5.
Comparison and evaluation
6.
Conclusions and future work
7
Content Description
Descriptors represent the audiovisual content:
Global descriptors:
Color Histogram
Edges Histogram
Local descriptors:
SIFT
Color SIFT
Acoustic descriptors:
Frequencies energy
MFCC
Assumption: high similarity between descriptors implies
high similarity between content.
8
Multimodal Detection
Multimodal detection is usually implemented as the
merge of independent mono-modal systems:
9
Bag-of-Visual-Words
Local descriptors are invariant to many content
transformations.
Most CBVCD systems use local descriptors and follow
the Bag-of-Visual-Words (BOVW) approach.
A visual vocabulary or codebook is used to describe content.
An inverted index resolves searches in “immediate run-time”.
10
Issues for BOVW approach
Quantization of local descriptors produces loss of
information.
Many techniques focuses on reducing this loss:
Hamming embedding [Jegou et al., 2008].
Spatial pyramids [Lazebnik et al., 2006].
Soft-assignment [Van Gemert et al., 2008].
Many others..
The codebook computation is expensive.
Quantization may produce many false alarms.
11
PhD Motivation
State-of-the-art methods seem to prioritize efficiency.
Quantization dramatically improves efficiency but affects
effectiveness.
Many descriptors focus on computation time and disk space
rather than effectiveness.
Is it possible to successfully process large datasets while
prioritizing effectiveness?
Can the metric space approach be applied in video
domain in order to achieve high effectiveness and high
efficiency?
[1] J. M. Barrios. Content-based video copy detection. In Proc. of the int. conf. on Multimedia
(ACMMM), pages 1141-1142. 2009.
12
Outline
1.
Introduction
2.
Related Work
3.
General Overview of our approach
4.
Review of main contributions
Regarding Effectiveness
Regarding Efficiency
5.
Comparison and evaluation
6.
Conclusions and future work
13
General Overview
Preprocessing
Remove noisy/plain frames.
Detect and/or revert some transformations like
camcording, picture in picture, flip.
Input: one video.
Output: one or more videos.
14
General Overview
Segmentation and feature extraction
Input: video
Output: list of segments, list of descriptors that
represent the audiovisual content.
15
General Overview
Similarity Search
Compare each segment in qto all segments in R.
Input: Descriptors by segment, distance function.
Output: List of k-NN for each query segment.
16
General Overview
Copy Localization
Locates chains of NN with temporal consistency.
Input: List of k-NN for each query segment
Output: The list of excerpts with high matching
score.
S03E04
17
Outline
1.
Introduction
2.
Related Work
3.
General Overview of our approach
4.
Review of main contributions
Regarding Effectiveness
Regarding Efficiency
5.
Comparison and evaluation
6.
Conclusions and future work
18
Multiple Content Description
For each segment extract different kinds of descriptors.
Global descriptors (edge histogram, color histogram)
Acoustic descriptors
Local descriptors (sift, csift)
The same segmentation must be used for all descriptors.
19
Multimodal Detection
Modalities can be fused at search time:
[2] J. M. Barrios, B. Bustos, and X. Anguera. Combining features at search time: PRISMA at video copy
detection task. In Proc. of TRECVID. NIST, 2011.
20
Distance fusion
Spatio-temporal combined distance:
γγγ
21
Combined distance
Audiovisual distance:
Each τ
i
intends to scale distances to a common range.
Each w
i
intends to favor better descriptors.
Is it possible to automatically determine reasonably good
weights without training data?
At least better than
τ
i
= 1/max
i
and w
i
=1/3
22
Automatic Weights
Compute the histogram of distances for each underlying
function:
Accumulating d
i
(x,y) for random x, y.
23
α
αα
α-Normalization
Normalize each d
i
by a distance value with identical
cumulative probability α.
Nearest neighbors are close to the query object,
therefore use α-normalization with α << 1
[3] J. M. Barrios and B. Bustos. Automatic weight selection for multi-metric distances. In Proc. of the
int. workshop on Similarity Search and Applications (SISAP), pages 61-68, 2011.
24
Automatic Weighting
Weighting by max-
τ
Select weights w
i
that maximize
τ
in the combined distance δ
(i.e., to maximize the value that α-normalizes δ).
Weighting by max-
ρ
Select weights w
i
that maximize in δ.
max τ
(0,1,0)
(0,0,1) (1,0,0)
(0,1,0) (0,1,0)
(0,0,1) (1,0,0) (0,0,1) (1,0,0)
max ρ
max MAP
τ
(w
1
,w
2
,w
3
)
ρ
(w
1
,w
2
,w
3
)MAP(w
1
,w
2
,w
3
)
[4] J. M. Barrios and B. Bustos. Competitive content-based video copy detection using global
descriptors. Multimedia Tools and Applications, 62(1):75-110, 2013.
25
Outline
1.
Introduction
2.
Related Work
3.
General Overview of our approach
4.
Review of main contributions
Regarding Effectiveness
Regarding Efficiency
5.
Comparison and evaluation
6.
Conclusions and future work
26
Metric Access Methods
Metric properties in a distance function:
Reflexivity, Non-negativity, Symmetry, Triangle inequality.
Can be used to estimate distances:
In general, the quality of the pivots and lower bounds depend on
the intrinsic dimensionality:
27
Approximate Search
Approximate k-NN search using pivots:
Use the lower bound as a fast distance estimator.
Perform a linear scan using the distance:
where is precomputed and stored in memory
Compute the actual distance only for the T% objects with lower
estimation.
Quality of estimation and search time depends on the number of
pivots.
[5] J. M. Barrios and B. Bustos. P-VCD: A pivot-based approach for content-based video copy
detection. In Proc. of the IEEE int. conf. on Multimedia and Expo (ICME)., pages 1-6. IEEE, 2011.
28
Approximate Search
A time-expensive distance:
29
Approximate Search
A time-inexpensive distance:
30
Effectiveness-versus-Efficiency
A combined distance δimproves effectiveness
compared to a simple distance d.
It improves discrimination between correct and incorrect objects.
δusually has higher intrinsic dimensionality than d.
Lower bounds lose their effectiveness at estimating actual
distance.
The approximate search using δmay achieve
(paradoxically) lower effectiveness than d.
Two-step search:
Locate candidate videos with simpler distances.
Use the combined distance in a smaller subset of candidates and
exact search.
[2] J. M. Barrios, B. Bustos, and X. Anguera. Combining features at search time: PRISMA at video copy
detection task. In Proc. of TRECVID. NIST, 2011.
31
Exact Search in Videos
Two consecutive segments in a query video are usually similar.
Snake distribution: i
th
and (i+1)
th
query objects are closer than
random pairs in R.
It may occur in video similarity and interactive search.
In some cases, queries can be reordered to enhance this property.
Use {q
1
, ..., q
i-1
} as dynamic pivots to resolve q
i
.
[6] J. M. Barrios, B. Bustos, and T. Skopal. Snake table: A dynamic pivot table for streams of k-nn
searches. In Proc. of the int. workshop on Similarity Search and Applications (SISAP), 25-39. 2012.
32
Snake Table
Start with an empty pivot table.
Resolve q
1
:
No pivots! search is just a linear scan in R.
Add q
1
to pivot table, i.e., store d(*,q
1
).
For each q
i
(i > 1):
Compute distances from q
i
to {q
1
,…,q
i-1
}.
Resolve search using {q
1
,…,q
i-1
} as pivots.
Add q
i
to pivot table following some replacement strategy.
33
Snake Table performance
Pro: Improves efficiency in time-expensive distances and with high
intrinsic dimensionality.
Con: In time-inexpensive distances, static pivots may be preferable.
34
Outline
1.
Introduction
2.
Related Work
3.
General Overview of our approach
4.
Review of main contributions
Regarding Effectiveness
Regarding Efficiency
5.
Comparison and evaluation
6.
Conclusions and future work
35
Comparing Exact Searches
Comparing performance for different simple descriptors:
Snake table
Static pivots
kd-tree and k-means tree (implemented by FLANN).
36
Comparing Approximate Searches
Precision versus Search Time in approximate searches:
37
MUSCLE-VCD-2007
Medium-size dataset (60 hours, 36 GB).
Best result is a 97% detection [Poullot et al., 2010].
The combined spatio-temporal distance:
detects all the 100% copies without false alarms.
38
TRECVID
TRECVID run a CBVCD evaluation between 2008 and
2011.
PRISMA team participated at 2010 and 2011.
Larger dataset: 419 hours, 100 GB.
56 transformations: 8 visual * 7 audio.
21 participant teams.
Two scenarios: No false alarms and Balanced profiles.
Original
Copies
39
No False Alarms Profile
Multimodal detection
outperforms visual-only
detection.
High accuracy at copy
localization.
Good effectiveness-versus-
efficiency tradeoff.
Global descriptors can achieve
high performance.
[2] J. M. Barrios, B. Bustos, and X. Anguera.
Combining features at search time: PRISMA at video
copy detection task. In Proc. of TRECVID. NIST, 2011.
40
Balanced Profile
Global descriptors achieve
higher performance at No
False Alarms than Balanced
profile.
Many transformations are
almost undetectable for global
descriptors.
The achieved performance
validates proposed techniques
are relevant.
Tests were run on a desktop
computer:
Intel Core i7-2600k
8 GB RAM
41
Outline
1.
Introduction
2.
Related Work
3.
General Overview of our approach
4.
Review of main contributions
Regarding Effectiveness
Regarding Efficiency
5.
Comparison and evaluation
6.
Conclusions and future work
42
Benefits of the proposed approach
Enables the use of complex similarity measures.
e.g., linear combinations and spatio-temporal distances.
Almost no restriction to descriptor format, as long as the
distance function knows how to compare them.
The metric approach may improve efficiency even for
complex similarity measures, as long as it satisfies metric
properties.
Compared to coordinate-based techniques:
Metric methods are faster in exact searches.
Metric methods are not restricted to vector descriptors and L
2
.
Metric methods may even not need an indexing phase.
43
Drawbacks of the proposed approach
A complex similarity measure is usually computationally
expensive and produces spaces with high intrinsic
dimensionality.
Limits the size of the dataset that can be processed.
Metric properties restrict the similarity model.
Triangle inequality is usually not compatible with adaptive
weights or partial similarity.
The non-metric approach produced only a slight improvement in
effectiveness.
Compared to coordinate-based techniques:
In the case of vector descriptors, metric methods produce much
worse approximation result.
In particular, kd-tree performs very well for approximate search
and SIFT descriptors, much better than metric methods.
44
Future Research
Develop a library for similarity search that follows the
structure of FLANN, providing a common API to use
metric access methods and coordinate-based methods.
Extend the α-normalization to an “universal
normalization” which replaces each distance by its
cumulative probability.
Research the use of “distance fusion” in classifiers as an
alternative method to early fusion and late fusion.
Analyze the reordering of query objects as a technique to
enhance snake distributions.
[7] J. M. Barrios, B. Bustos, and T. Skopal. Analyzing and dynamically indexing the query set.
Information Systems, 2013.
45
S03E04
S02E07
q
Thank You!
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Content-Based Video Copy Detection (CBVCD) consists of detecting whether or not a video document is a copy of some known original and to retrieve the original video. CBVCD systems rely on two different tasks: Feature Extraction task, that calculates many representative descriptors for a video sequence, and Similarity Search task, that is the algorithm for finding videos in an indexed collection that match a query video. This work details a CBVCD approach based on a combination of global descriptors, an automatic weighting algorithm, a pivot-based index structure, an approximate similarity search, and a voting algorithm for copy localization. This approach is analyzed using MUSCLE-VCD-2007 corpus, and it was tested at the TRECVID 2010 evaluation together with other state-of-the-art CBVCD systems. The results show that this approach enables global descriptors to achieve competitive results and even outperforms systems based on combination of local descriptors and audio information. This approach has a potential of achieving even higher effectiveness due to its seamless ability of combining descriptors from different sources at the similarity search level. KeywordsVideo copy detection–Metric spaces–Automatic weighting–Approximate search–Multimedia information retrieval