Content uploaded by Juan Manuel Barrios
Author content
All content in this area was uploaded by Juan Manuel Barrios on Oct 29, 2014
Content may be subject to copyright.
Content-Based Video Copy Detection
Juan Manuel Barrios
Department of Computer Science
University of Chile
PhD Thesis Defense
Santiago, November 26th, 2013
2
Outline
1.
Introduction
2.
Related Work
3.
General Overview of our approach
4.
Review of main contributions
Regarding Effectiveness
Regarding Efficiency
5.
Comparison and evaluation
6.
Conclusions and future work
3
Video Copy Detection (VCD) consists in locating the
videos that are copies of some original video.
Many definitions of “copy”.
In this work, a copy is a derivative work produced by the
application of one or more “content transformations”.
Content-Based Video Copy Detection (CBVCD) is the
approach of locating copies relying exclusively on the
audiovisual content.
Introduction
4
Reference collection, “R”:Set of known videos.
Introduction
5
Introduction
Query collection, “Q”:One or more unknown videos.
E.g: youtube videos of compilations, selected scenes, etc.
Problem: Given Qand R, determine for each video in Q
which reference videos from Rare visible.
6
Outline
1.
Introduction
2.
Related Work
3.
General Overview of our approach
4.
Review of main contributions
Regarding Effectiveness
Regarding Efficiency
5.
Comparison and evaluation
6.
Conclusions and future work
7
Content Description
Descriptors represent the audiovisual content:
Global descriptors:
Color Histogram
Edges Histogram
Local descriptors:
SIFT
Color SIFT
Acoustic descriptors:
Frequencies energy
MFCC
Assumption: high similarity between descriptors implies
high similarity between content.
8
Multimodal Detection
Multimodal detection is usually implemented as the
merge of independent mono-modal systems:
9
Bag-of-Visual-Words
Local descriptors are invariant to many content
transformations.
Most CBVCD systems use local descriptors and follow
the Bag-of-Visual-Words (BOVW) approach.
A visual vocabulary or codebook is used to describe content.
An inverted index resolves searches in “immediate run-time”.
10
Issues for BOVW approach
Quantization of local descriptors produces loss of
information.
Many techniques focuses on reducing this loss:
Hamming embedding [Jegou et al., 2008].
Spatial pyramids [Lazebnik et al., 2006].
Soft-assignment [Van Gemert et al., 2008].
Many others..
The codebook computation is expensive.
Quantization may produce many false alarms.
11
PhD Motivation
State-of-the-art methods seem to prioritize efficiency.
Quantization dramatically improves efficiency but affects
effectiveness.
Many descriptors focus on computation time and disk space
rather than effectiveness.
Is it possible to successfully process large datasets while
prioritizing effectiveness?
Can the metric space approach be applied in video
domain in order to achieve high effectiveness and high
efficiency?
[1] J. M. Barrios. Content-based video copy detection. In Proc. of the int. conf. on Multimedia
(ACMMM), pages 1141-1142. 2009.
12
Outline
1.
Introduction
2.
Related Work
3.
General Overview of our approach
4.
Review of main contributions
Regarding Effectiveness
Regarding Efficiency
5.
Comparison and evaluation
6.
Conclusions and future work
13
General Overview
Preprocessing
Remove noisy/plain frames.
Detect and/or revert some transformations like
camcording, picture in picture, flip.
Input: one video.
Output: one or more videos.
14
General Overview
Segmentation and feature extraction
Input: video
Output: list of segments, list of descriptors that
represent the audiovisual content.
15
General Overview
Similarity Search
Compare each segment in qto all segments in R.
Input: Descriptors by segment, distance function.
Output: List of k-NN for each query segment.
16
General Overview
Copy Localization
Locates chains of NN with temporal consistency.
Input: List of k-NN for each query segment
Output: The list of excerpts with high matching
score.
S03E04
17
Outline
1.
Introduction
2.
Related Work
3.
General Overview of our approach
4.
Review of main contributions
Regarding Effectiveness
Regarding Efficiency
5.
Comparison and evaluation
6.
Conclusions and future work
18
Multiple Content Description
For each segment extract different kinds of descriptors.
Global descriptors (edge histogram, color histogram)
Acoustic descriptors
Local descriptors (sift, csift)
The same segmentation must be used for all descriptors.
19
Multimodal Detection
Modalities can be fused at search time:
[2] J. M. Barrios, B. Bustos, and X. Anguera. Combining features at search time: PRISMA at video copy
detection task. In Proc. of TRECVID. NIST, 2011.
20
Distance fusion
Spatio-temporal combined distance:
γγγ
21
Combined distance
Audiovisual distance:
Each τ
i
intends to scale distances to a common range.
Each w
i
intends to favor better descriptors.
Is it possible to automatically determine reasonably good
weights without training data?
At least better than
τ
i
= 1/max
i
and w
i
=1/3
22
Automatic Weights
Compute the histogram of distances for each underlying
function:
Accumulating d
i
(x,y) for random x, y.
23
α
αα
α-Normalization
Normalize each d
i
by a distance value with identical
cumulative probability α.
Nearest neighbors are close to the query object,
therefore use α-normalization with α << 1
[3] J. M. Barrios and B. Bustos. Automatic weight selection for multi-metric distances. In Proc. of the
int. workshop on Similarity Search and Applications (SISAP), pages 61-68, 2011.
24
Automatic Weighting
Weighting by max-
τ
Select weights w
i
that maximize
τ
in the combined distance δ
(i.e., to maximize the value that α-normalizes δ).
Weighting by max-
ρ
Select weights w
i
that maximize in δ.
max τ
(0,1,0)
(0,0,1) (1,0,0)
(0,1,0) (0,1,0)
(0,0,1) (1,0,0) (0,0,1) (1,0,0)
max ρ
max MAP
τ
(w
1
,w
2
,w
3
)
ρ
(w
1
,w
2
,w
3
)MAP(w
1
,w
2
,w
3
)
[4] J. M. Barrios and B. Bustos. Competitive content-based video copy detection using global
descriptors. Multimedia Tools and Applications, 62(1):75-110, 2013.
25
Outline
1.
Introduction
2.
Related Work
3.
General Overview of our approach
4.
Review of main contributions
Regarding Effectiveness
Regarding Efficiency
5.
Comparison and evaluation
6.
Conclusions and future work
26
Metric Access Methods
Metric properties in a distance function:
Reflexivity, Non-negativity, Symmetry, Triangle inequality.
Can be used to estimate distances:
In general, the quality of the pivots and lower bounds depend on
the intrinsic dimensionality:
27
Approximate Search
Approximate k-NN search using pivots:
Use the lower bound as a fast distance estimator.
Perform a linear scan using the distance:
where is precomputed and stored in memory
Compute the actual distance only for the T% objects with lower
estimation.
Quality of estimation and search time depends on the number of
pivots.
[5] J. M. Barrios and B. Bustos. P-VCD: A pivot-based approach for content-based video copy
detection. In Proc. of the IEEE int. conf. on Multimedia and Expo (ICME)., pages 1-6. IEEE, 2011.
28
Approximate Search
A time-expensive distance:
29
Approximate Search
A time-inexpensive distance:
30
Effectiveness-versus-Efficiency
A combined distance δimproves effectiveness
compared to a simple distance d.
It improves discrimination between correct and incorrect objects.
δusually has higher intrinsic dimensionality than d.
Lower bounds lose their effectiveness at estimating actual
distance.
The approximate search using δmay achieve
(paradoxically) lower effectiveness than d.
Two-step search:
Locate candidate videos with simpler distances.
Use the combined distance in a smaller subset of candidates and
exact search.
[2] J. M. Barrios, B. Bustos, and X. Anguera. Combining features at search time: PRISMA at video copy
detection task. In Proc. of TRECVID. NIST, 2011.
31
Exact Search in Videos
Two consecutive segments in a query video are usually similar.
Snake distribution: i
th
and (i+1)
th
query objects are closer than
random pairs in R.
It may occur in video similarity and interactive search.
In some cases, queries can be reordered to enhance this property.
Use {q
1
, ..., q
i-1
} as dynamic pivots to resolve q
i
.
[6] J. M. Barrios, B. Bustos, and T. Skopal. Snake table: A dynamic pivot table for streams of k-nn
searches. In Proc. of the int. workshop on Similarity Search and Applications (SISAP), 25-39. 2012.
32
Snake Table
Start with an empty pivot table.
Resolve q
1
:
No pivots! search is just a linear scan in R.
Add q
1
to pivot table, i.e., store d(*,q
1
).
For each q
i
(i > 1):
Compute distances from q
i
to {q
1
,…,q
i-1
}.
Resolve search using {q
1
,…,q
i-1
} as pivots.
Add q
i
to pivot table following some replacement strategy.
33
Snake Table performance
Pro: Improves efficiency in time-expensive distances and with high
intrinsic dimensionality.
Con: In time-inexpensive distances, static pivots may be preferable.
34
Outline
1.
Introduction
2.
Related Work
3.
General Overview of our approach
4.
Review of main contributions
Regarding Effectiveness
Regarding Efficiency
5.
Comparison and evaluation
6.
Conclusions and future work
35
Comparing Exact Searches
Comparing performance for different simple descriptors:
Snake table
Static pivots
kd-tree and k-means tree (implemented by FLANN).
36
Comparing Approximate Searches
Precision versus Search Time in approximate searches:
37
MUSCLE-VCD-2007
Medium-size dataset (60 hours, 36 GB).
Best result is a 97% detection [Poullot et al., 2010].
The combined spatio-temporal distance:
detects all the 100% copies without false alarms.
38
TRECVID
TRECVID run a CBVCD evaluation between 2008 and
2011.
PRISMA team participated at 2010 and 2011.
Larger dataset: 419 hours, 100 GB.
56 transformations: 8 visual * 7 audio.
21 participant teams.
Two scenarios: No false alarms and Balanced profiles.
Original
Copies
39
No False Alarms Profile
Multimodal detection
outperforms visual-only
detection.
High accuracy at copy
localization.
Good effectiveness-versus-
efficiency tradeoff.
Global descriptors can achieve
high performance.
[2] J. M. Barrios, B. Bustos, and X. Anguera.
Combining features at search time: PRISMA at video
copy detection task. In Proc. of TRECVID. NIST, 2011.
40
Balanced Profile
Global descriptors achieve
higher performance at No
False Alarms than Balanced
profile.
Many transformations are
almost undetectable for global
descriptors.
The achieved performance
validates proposed techniques
are relevant.
Tests were run on a desktop
computer:
Intel Core i7-2600k
8 GB RAM
41
Outline
1.
Introduction
2.
Related Work
3.
General Overview of our approach
4.
Review of main contributions
Regarding Effectiveness
Regarding Efficiency
5.
Comparison and evaluation
6.
Conclusions and future work
42
Benefits of the proposed approach
Enables the use of complex similarity measures.
e.g., linear combinations and spatio-temporal distances.
Almost no restriction to descriptor format, as long as the
distance function knows how to compare them.
The metric approach may improve efficiency even for
complex similarity measures, as long as it satisfies metric
properties.
Compared to coordinate-based techniques:
Metric methods are faster in exact searches.
Metric methods are not restricted to vector descriptors and L
2
.
Metric methods may even not need an indexing phase.
43
Drawbacks of the proposed approach
A complex similarity measure is usually computationally
expensive and produces spaces with high intrinsic
dimensionality.
Limits the size of the dataset that can be processed.
Metric properties restrict the similarity model.
Triangle inequality is usually not compatible with adaptive
weights or partial similarity.
The non-metric approach produced only a slight improvement in
effectiveness.
Compared to coordinate-based techniques:
In the case of vector descriptors, metric methods produce much
worse approximation result.
In particular, kd-tree performs very well for approximate search
and SIFT descriptors, much better than metric methods.
44
Future Research
Develop a library for similarity search that follows the
structure of FLANN, providing a common API to use
metric access methods and coordinate-based methods.
Extend the α-normalization to an “universal
normalization” which replaces each distance by its
cumulative probability.
Research the use of “distance fusion” in classifiers as an
alternative method to early fusion and late fusion.
Analyze the reordering of query objects as a technique to
enhance snake distributions.
[7] J. M. Barrios, B. Bustos, and T. Skopal. Analyzing and dynamically indexing the query set.
Information Systems, 2013.
45
S03E04
S02E07
q
Thank You!