Content uploaded by Konstantinos Avgerinakis
Author content
All content in this area was uploaded by Konstantinos Avgerinakis on Nov 12, 2017
Content may be subject to copyright.
DT classification
•Dynamic Textures (DT) appear in
videos of natural scenes and are moving
particles with non-rigid boundary and
irregular motion patterns, i.e. fire, water
1. Analyze presegmented video
sequences that contain one type of DT
2. Represent & Encode the objects and
other entities that are moving
throughout video sequences.
3. Recognize DT type in a Binary (i.e.
fire/non-fire) or multi-class manner
•Application on early warning systems
of security and surveillance purposes :
http://beaware-project.eu/
DT classification - challenges
•Different classes might coexist in the
same scene (i.e. fire & smoke, animals &
water, water & people )
•Inter-class appearance similarities (i.e.
smoke & clouds)
•Intra-class differences with large
appearance and motion variations
•Unpredictable motion patterns with
stochastic and stationary nature.
•Multiple occlusions might be present
•Moving entities are non-rigid particles
with a dynamically changing shape
•Transparency is often present e.g. in the
cases of smoke, fire, water
•Camera motion, viewpoint variations
DT classification – related work
•Linear Dynamic System (LDS):
oDTs are usually highly non-linear, highly complex
oPrecise modeling is very challenging
oLDS do not achieve SoA performance
•Stabilized higher order LDS:
onovel descriptor with points on a Grassmannian
manifold (HoGP[1])
ohigh accuracy, high computational cost
•Local ST features: VLBP[4], LBP-TOP[5], LBP-Fourier[3],
ST energy features[2]
oAccuracy vs computational complexity
Our Methodology
Compute LBP for each point
Source Image + AA
Optical Flow + AA
Concatenate
LBPxy, LBPxt,
LBPyt into
LBP-flow
GMM
Visual
Vocab.
Encoding
LBPxy
LBPxt, LBPyt
LBP-flow1
LBP-flow2
…
LBP-
flowN
Classification
Binary/Multi Dimension
reduction Fisher1
Fisher2
…
Fisher
N
ANALYZE
REPRESENT
ENCODE
RECOGNISE
Activity Areas
•We assume that background induced noise is Gaussian
•Motion estimates are caused by true motion or noise:
oH0: uk0(r ) = zk(r )
oH1: uk1(r ) = zk(r ) + uk(r )
•Gaussian random variables have zero kurtosis
•Activity Areas have zero values at pixels where kurtosis
tends to zero
Opt. FlowSampling
LBP-Flow computation
•LBP-Flow builds upon LBP:
oDifference of intensity values at each pixel and its
neighboring pixels within radius R
oExtended to include optical flow values around pixel r
to take into account motion information
•RGB(X-Y) axis differences for appearance information
•OF(X-T), OF(Y-T) differences for motion information
Src Img
Opt. flow img LBPxy, LBPxt, LBPyt
xt
yt
xy
Concat. for W frames
W
Sampling
Neighbor
Distances
Space-Time
aggregation
LBP-Flow descriptor
•Fisher Encoding takes place so as to aggregate the
resulting LBP-Flow vector for each video sample.
•A Neural Network (NN) with three layers is trained:
oDimensionality reduction
oHidden layer
oClassification layer
•Only two Fully Connected (FC) layers are enough due to
the highly discriminative Fisher Vector
•Recognition tasks:
oBinary-DT classification (nodes: 6, 6, sigmoid)
oMulti-DT classification (nodes: 128, 64, softmax)
Experiments – binary classification
Method Dyntex
[6]
Video-Water DB
[3]
LBP-Flow 92.7% 98.3%
LBP-Flow+NN 94.3% 98.8%
LBP-Fourier [3] 95,8%98.4%
VLBP [4] 90,0% 93.8%
LPB-TOP [5] 87,5% 93.3%
St-TCoF[7] 90.0% 97.2%
•Water/ Non-Water videos
Experiments – multi-classification
Dyntex HOGP [1] LBP-Flow LBP-Flow+NN
Fire -82.1% 78.6%
Smoke 83.0% 91.7% 91.7%
Vegetation 81.0 78.6% 92.9%
Flags 56.0% 66.7% 75.0%
Fountain 88.0% 55.6% 77.8%
CalmWater 81.0% 85.0% 100.0%
Sea 81.0% 95.8% 100.0%
HomeWater -88.0% 84.0%
All - 75.2% 88.8%
ST-EF [2] Mov.Vistas
[8]
LBP-Flow LBP-Flow+NN
41.0% 52.0% 62.3% 63.8%
Experiments – Dyntex
Experiments – Moving Vistas
Conclusions
•Out hybrid LBP-Flow+NN framework outperforms both
global and local DT approaches, both in binary and
multiclass recognition tasks
oEfficient computational cost due to the shallow
descriptor
oHighly accurate recognition rates due to the deep
learning
•High accuracy rates in Water and Fire recognition
indicates that our algorithm can be applied in
surveillance and security systems
•Future work
oDeply spatio-temporal localization by
using superpixels and LBP-Flow+NN
References
[1] K. Dimitropoulos, P. Barmpoutis, A. Kitsikidis, N. Grammalidis,
“Classification of multidimensional time-evolving data using
histograms of grassmannian points”, IEEE T-CSVT PP (99) (2017) 1-1.
[2] K. G. Derpanis, M. Lecce, K. Daniilidis, R. P. Wildes, “Dynamic
scene understanding: The role of orientation features in space and time
in scene classification”, in: 2012 IEEE CVPR 2012, pp. 1306-1313.
[3] P. Mettes, R. T. Tan, R. C. Veltkamp, Water detection through spatio-
temporal invariant descriptors, CVIU 154 (2017) 182-191.
[4] G. Zhao, M. Pietikainen, “Local binary pattern descriptors for
dynamic texture recognition”, in: 18th ICPR, Vol. 2, 2006, pp. 211-214.
[5] G. Zhao, T. Ahonen, J. Matas, M. Pietikainen, “Rotation-invariant
image and video description with local binary pattern features”, IEEE
T-IP, 21 (4) (2012) 1465-1477.
[6] R. Pteri, S. Fazekas, M. J. Huiskes, “Dyntex: A comprehensive
database of dynamic textures”, Pattern Recognition Letters 31 (12)
(2010) 1627-1632, pattern Recognition of Non-Speech Audio.
[7] X. Qi, C.-G. Li, G. Zhao, X. Hong, M. Pietikinen, “Dynamic texture
and scene classification by transferring deep image features”,
Neurocomputing 171 (2016) 1230 { 1241.