Content uploaded by Ekkaphon Mingkhwan
Author content
All content in this area was uploaded by Ekkaphon Mingkhwan on Jan 07, 2019
Content may be subject to copyright.
Digital Image Stabilization Technique for Fixed
Camera on Small Size Drone
Ekkaphon Mingkhwan
Control and Communication Division
Defence Technology Institute (Public Organization)
Nontaburi, Thailand
ekkaphon.m@dti.or.th
Weerawat Khawsuk
Control and Communication Division
Defence Technology Institute (Public Organization)
Nontaburi, Thailand
weerawat.k@dti.or.th
Abstract—This paper explores a digital image algorithm to
stabilize videos recorded from a fixed camera (without stabilized
machanical tools) on a small size drone. In particular, this
paper focuses on implementation of the Speed-Up Robust Feature
(SURF) method. The fundamental concept is to match 2 pictures,
one obtained from the current image frame and another from the
previous (or reference) frame. The matching process is achieved
by locating common keypoints between the current and reference
frames and associating them together. Then transformation is
then implemented to translate and rotate the current image
frame so that keypoints remain in the same position or as
close as possible to those of the reference frame. Various video
samples are used to validate the SURF method’s efficiency. The
scenarios include recorded videos under a normal light condition
and having partial shadows on the image. The movement due
to drone’s engine and environmental winds are also under this
study. The results indicate that the SURF method can be used
to stabilize image frame so that the processed video becomes
smoother and more suitable for viewing.
Keywords—SURF, Video Stabilize, Matching Estimation,
Warping
I. INTRODUCTION
Nowadays, there have been widespread uses of unmanned
aerial vehicles (UAVs) or drones for landscape survey and
aerial images. These UAVs come with various sizes depending
on applications. A larger sized UAV can carry more payloads
such as high quality cameras and stabilized devices (or gim-
bals). This gimbal has capability to compensate the UAV’s
attitude into an opposite direction such that the view of camera
remains unchanged (with respect to the Earth’s axis). Thus,
the resulting video images appear smoother for viewing. For
a smaller sized UAV, on the other hand, there is not enough
available space to install both a video camera and a gimbal.
Typically, the camera is fixed to the UAV’s body. When the
UAV is in operation, its engine causes vibration and shaky
maneuver due to environment’s uncontrolled disturbances.
These movement directly contribute to unstable video images.
It is obvious that video images from a fixed camera on
a small UAV are not smooth enough for human eyes. To
record small objects, their images and movement can be
difficult to notice to some extends. Moreover as continually
monitoring these unstable video images, a viewer can develop
the uneasy states such as motion sickness or nausea feeling.
These symptoms are caused by lacks of coordination between
viewing position and body balance. Hence if problems due to
unstable motion images from a fixed camera can be solved or
reduced without adding extra payloads, the video quality from
a small UAV for landscape survey application will efficiently
be improved.
The image stabilization can be achieved via mechanical
devices, optical sensors, or digital methods. For a digital sta-
bilization, there are various well-developed algorithms such as
Scale-Invariant Feature Transform (SIFT) method and Speed-
Up Robust Feature (SURF) method. In particular, this work
focuses on implementation of the SURF method.
The organization of this paper is as follows. The literature
review in Section II provides summarized topics on motion in
mobile videos, motion model, image stabilization techniques,
feature detection method, SURF algorithm, and related re-
searches. Section III describes implementation consisting of
experimental setups, stabilization process, and performance
measures. Results and discussion are presented in Section IV.
Finally, the conclusion remarks are given in Section V.
II. LI TE RATU RE RE VI EW
A. Motion in Mobile Videos
When an untrained person records videos, various types of
motions are acted on a hand held camera. In general, the object
motion or the camera movement causes video images to shake
or move. There are 7typical motions in video images, which
can be described as follows [1, 2].
•Track is a left or right translation in a horizontal or X-
direction.
•Boom is an up or down translation in a vertical or Y-
direction.
•Dolly is a forward or backward translation in the camera
axis direction.
•Pan is a left or right turn around the vertical axis.
•Tilt is an up or down turn around the horizontal axis.
•Roll is a rotation around the camera axis direction.
•Zoom is not a camera position movement, but a change
of camera focal length causing changes of image size.
Various camera movement in each axis cause motions in
recording videos. Therefore to generate stabilized video im-
ages, we should estimate these camera movement and then
compensate them in an opposite direction.
B. Motion Model
A2D or 3D motion model is used to represent camera
movement. A 2D model describes various transformations of
images occurred in a 2D plane or (x, y)coordinate. Denote the
original (X)and new locations (X′)and a translation distance
(t)in each direction as
X=[x
y], X′=[x′
y′], t =[tx
ty](1)
The transformations for movement are modeled as follows [1].
1) Translation can be written as X′=X+t
2) Rotation and translation, also known as a rigid body
motion, can be written as X′=RX +t, where
R=[cos θ−sin θ
sin θcos θ](2)
is a rotation matrix at angle θwith respect to X axis.
3) Scaled rotation, also known as a similarity transform,
preserves the angle of an original image and a new
(scaled) version. This transformation can be described
as X′=SRX +t, where
S=[a−b
b a](3)
Here, it is not necessary that a2+b2= 1.
4) Affine transformation preserves parallel lines between
these images. The relation is given as X′=AX, where
A=[a00 a01 a02
a10 a11 a12](4)
5) Projective transformation operates on homogeneous co-
ordinates between X′and Xsuch that
x′
y′
1
=
h00 h01 h02
h10 h11 h12
h20 h21 h22
x
y
1
(5)
To obtain a homogeneous coordinate, we normalize the
new location X′by enforcing h20x+h21 y+h22 = 1
condition as shown in above equation.
The projective transformation is a combination between the
affine transformation and projective warps. It uses 9 parame-
ters for estimation. On the other hand, the affine transformation
uses 6 parameters to preserves parallel lines and image ratios.
Hence, the affine transformation using less parameters is more
suitable for motion estimation.
Fig. 1: Illustration of similarity (scaled rotation), affine, and
projective transformations [4].
C. Image Stabilization Techniques
In general, a video quality will be improved if one can
reduce the image motion. One of practical reduction tech-
niques is to re-order frames of video images. This technique
consists of 2 stages as: 1) a motion estimation of image frames
and 2) a removal of undesired inter-frame motion as well as
any distortion due to movement [5, 6]. The process at each
stage can be achieved either by a hardware or a software. The
hardware implementation is performed via a mechanical image
stabilization or an optical image stabilization. On the other
hand, a software implementation stabilizes image digitally. We
can classify techniques of image stabilization as follows [1].
(a) (b)
Fig. 2: (a) gyroscopic sensor, (b) 3 axis gimbal camera
1) Mechanical image stabilization: It uses a gimbal, com-
prising of gyroscopic sensors and a mechanical system, as
illustrated in Figure 2, to stabilize video image. These gyro-
scopic sensors provide tilt angles of platform in each of 3D
directions. The mechanical system on a gimbal compensates
these angles by tilting camera in an opposite direction in order
to preserve the same image view (in comparing to the Earth’s
axis) at all time.
Fig. 3: Sensor-shift (left) and Lens-shift (right) method [7].
2) Optical image stabilization: When a ray of light passes
through an aperture and camera lens onto a charge-coupled
device (CCD) sensor, electrical signals are generated and
then transformed into images. A distance between lens and
a CCD sensor is critical to a quality of images. There are 2
implementation methods of an optical image stabilization as a
lens-shift method and a sensor-shift method [7] as illustrated
in Figure 3. Using information from motion sensors installed
within a camera, the first method adjusts the lens position so
that an image formed on a CCD sensor appears at the same
position while a camera is in motion. On the other hand, the
second method adjusts the position of a CCD sensor so that
the same image is formed as a camera moves.
3) Digital video stabilization: It is a post processing tech-
nique with various implementation methods. For example,
a path of camera movement is determined to compensate
for its motion or an additional frame is introduced so that
image features (keypoints) move smoother. Note that a video
image appears moving because the keypoints change their
positions in each frame. Therefore, this technique tries to
maintain keypoints of the current frame at the exact positions
or as close as to those from the previous frame. The image
feature determination depends on a processing speed, size, and
image orientation. Its efficiency remains a major challenge.
The digital video stabilization technique consists of 3 stages
as 1) motion estimation, 2) motion compensation to have a
smooth image, and 3) image warping to have the same viewing
angle [1, 5], as demonstrated by Figure 4.
(a) Original (b) Unstabilized
(c) Motion compensated (d) Warping image
Fig. 4: Demonstation of digital video stabilization
D. Feature Detection Method
Feature detection becomes a crucial part in the video
stabilization process. Various methods have been extensively
studied to detect image features with higher efficiency. Exam-
ples of the feature detection method are as follows.
1) Scale-invariant feature transform (SIFT): This method
transforms an image into a large collection of local feature vec-
tors, each of which is invariant to image translation, scaling,
rotation, affine, and projection [8]. The SIFT parameters for an
image feature are partially related to those parameters obtained
from the affine transform. There are 4 major computational
stages to generate sets of these image features [9]:
(1) Scale-space extrema detection searches over all image
locations and scales for its extrema. The efficient imple-
mentation uses a Difference-of-Gaussian (DoG) function
to identify keypoints invariant to scale and orientation.
(2) Keypoint localization and filtering improves important
keypoints and throws out insignificant ones.
(3) Orientation assignment identifies each keypoint location
based on the gradient direction of a local image. The
effects of rotation and scale transformations are further
removed. Hence, resulting images are invariant to these
transformations.
(4) Keypoint descriptor creation is generated from a his-
togram of orientation.
The SIFT method for feature detection uses DoG filters
in various levels as shown in Figure 5. It creates important
characteristics invariant to image transformations. Therefore,
this method is suitable for images with different viewpoint.
Fig. 5: DoG filters at various levels in the SIFT method [9]
2) Affine scale-nvariant feature transform (ASIFT): This
method extends the SIFT method’s capability so that it is
invariant to the camera axis orientations such as longitude and
latitude angles [6]. Figure 6 illustrates the keypoint association
of the magazine’s cover page determined by the SIFT and
ASIFT methods. It is evident that when the reference image
changes size and orientation, the ASIFT method can associate
more keypoints than does the SIFT method.
Fig. 6: Comparison of keypoint association obtained by SIFT
and ASIFT methods [10]
3) Principal Component Analysis SIFT (PCA-SIFT): This
method adopts advantages of a standard PCA technique to
reduce data dimensionality. It uses 36 image data rather than
128 data from the SIFT method. A data reduction contributes
the PCA-SIFT method to a faster detection of keypoints [11].
4) Speed-Up Robust Feature (SURF): This method devel-
ops based upon the SIFT method to response changes in size
and orientation of images. Instead of a DoG function, the
Haar wavelet is used to approximate the Lapacian of Gaussian
(LoG), resulting in more accurate detection [12].
From all feature detection methods (as described above),
each development aims to improve the detection efficiency. It
is evident that the SURF method is faster and more accurate
than the SIFT method. It gives better responses to images that
are changing in size, movement and speed, illumination, and
orientation [14]. Therefore, the SURF method is more suitable
to detect keypoints of moving and unstable images obtained
from a video camera of a small UAV.
E. SURF Algorithm
The SURF detection method is faster than the SIFT method
because it can determine keypoints from integral images rather
than the original images. An integral image I(X)of location
X= (x, y)stores a sum of all pixel intensities in a rectangular
area formed by point Xand its origin.
I(X) =
x
∑
i=0
y
∑
j=0
I(i, j)(6)
There are 4 stages for the SURF algorithm as follows.
1) Keypoint detection: This stage computes a determinant
of the Hessian matrix H(X, σ)of an Xlocation at a σscale
to represent a local change around an interest area, given by
H(X, σ) = [Lxx (X, σ)Lxy (X, σ )
Lyx(X , σ)Lyy(X, σ)],(7)
Lxy(X , σ) = I(X)⋆∂2
∂x∂ y g(σ)(8)
where Lxy(X , σ)is the convolution (⋆)of an integral image
I(X)with the 2nd order partial derivatives of the Gaussian
g(σ)with respect to x and y directions.
The SURF algorithm approximates these derivatives with
rectangular boxes or box filters. These derivative approxi-
mations can be evaluated very fast using integral images,
independently of filter size [13]. The 9×9box filters with
σ= 1.2represent the lowest scale (or the highest spatial reso-
lution). Denote quantities Dxx, Dyy and Dxy as discretized
approximations for Lxx(X, σ ), Lxy(X, σ), and Lyy(X, σ),
respectively. Thus, a determinant of the discretized Hessian
matrix H(X, σ)becomes
det (Happrox) = Dxx Dyy −(wDxy )2(9)
The weight wapplied to the rectangular region is kept simple
for computational efficiency. To keep a grey region around
keypoints to a zero value, the relative weights are adjusted to
w= 0.9for further balance in the Hessian’s determinant [14].
2) Keypoint localization: It computes a local extrema (max-
ima or minima) of a single selected keypoint to its nearest
neighbor. This stage builds a pyramid of the image LoG maps,
with different levels within octaves. An octave represents a
series of filter response maps obtained by convolving the same
input image with a filter of increasing size. In total, an octave
encompasses a scaling factor of 2. Each octave is subdivided
into a constant number of scale levels. The output of the 9×9
Fig. 7: Increasing the filter size and keeping the Gaussian
derivatives with corrected scales [14].
filter is considered as the initial scale level. The following
levels are obtained by filtering the image with gradually
bigger masks, taking into account a discrete nature of integral
images and filter structure. Figure 7 shows an increase of
filter size from 9 to 15 for the 1st octave, where the top
and bottom represent discretized approximations of Dyy and
Dxy, respectively. The filter sizes are {9,15,21,27}for the
1st octave, {15,27,39,51}for the 2nd octave, {27,51,75,99}
for the 3rd octave, and {51,99,147,195}for the 4th octave.
The LoG maps are the two middle scales of each octave
and also the adjacent scale-space neighborhood. A 3×3×3
scale-space neighborhood is used to determine if the interest
pixel is a local maximum within a search region. A pictorial
representation of the adjacent pixels in space and scale space
is provided in Figure 8. The center pixel (red) is considered
a local maximum among the surrounding points (grey area)
when it has the highest intensity in the search area. If its value
exceeds a pre-defined threshold, then that pixel is regarded as
a keypoint.
Fig. 8: The non-maximum suppression to detect a keypoint
from 3×3×3scale-space search [14].
3) Orientation assignment: This stage calculates the Haar-
wavelet responses in X and Y directions, with a wavelet
size of length 4s. Once the wavelet responses are calculated
and weigthed with a Gaussian (σ= 2.5s) centered at the
interest point, the responses are represented as vectors with
a coordinate of the horizontal and vertical response strengths,
respectively. The dominant orientation is estimated by calcu-
lating the sum of all responses within a sliding orientation
window covering an angle of π/3. The horizontal and vertical
responses within the window are summed. The two summed
responses then yield a new vector. The longest such vector
lends its orientation to the interest point [13].
4) Descriptor generation: The descriptor describes a dis-
tribution of Haar wavelet responses within the interest point
neighborhood. This stage partitions the interest region into
smaller 4×4(or 16) square sub-regions, where each sub-
region is further divided into 5×5(or 25) spaced sample
points as shown in Figure 9.
Fig. 9: An interest area is divided into 4×4sub-regions,
which are sampled into 5×5points [14].
Denote dx and dy as the Haar wavelet responses in hori-
zontal and vertical directions, in relation to the selected point
orientation. The, the wavelet responses dxand dyare summed
up over each sub-region and form a first set of entries to
the feature vector. In order to bring in information about the
polarity of the intensity changes, the sum of the absolute values
of the responses, |dx|and |dy|are also included. These values
forms a descriptor vector (v) as
v={∑dx, ∑|dx|,∑dy, ∑|dy|}(10)
For all 4×4(or 16) sub-areas, the length of a descriptor vector
becomes 64. The wavelet responses are invariant to a bias in
illumination. Invariant to contrast is achieved by turning the
descriptor into a unit vector.
F. Related Research
In general, motion images from a video camera can be
mechanically stabilized via with a gimbal. However for a
small UAV, there are some limitations to install both a video
camera and its gimbal because of less available spaces and
lower payload capacity. A typical approach is to use a post
processing technique for a digital video stabilization. Three
major stages for a video stabilization are motion estimation,
motion compensation, and image composition [12, 16, 17].
At the motion estimation stage, a video is divided into image
frames. The features of a current frame is extracted and then
matched with those from a previous frame. This estimation
stage yields an orientation and a distance of the video im-
age. During a motion compensation stage, these features are
adjusted such that their positions are located as close as to
the same features of the previous frame. Lastly, an image
composition is performed via a projective transform so that
all features are in the correct positions. These 3 subsequent
stages are continually repeated for a new render frame.
A vital process during the motion estimation is a detection
of image features. Studies suggest that a feature detection
using the SURF method is more efficient than the SIFT
method. Not only does the SURF method yield more accurate
results, but it also leads to a higher precision for keypoint
matching and less processing computations [12, 16, 17].
III. IMP LE ME NTATION
A. Experiment Setups
This experiment uses a multi-rotor equipped with a digital
video camera to capture motion images as shown in Figure 10.
The video is recorded in front of the Defence Technology
Institute (DTI) building during a mid-day under a normal
light condition. The multi-rotor is operated at about 20 meters
height while the camera is set at a full HD (or 1080p)
resolution utilizing a 30 frames/sec rate.
(a) (b)
Fig. 10: (a) Multi-rotor and (b) video camera
B. Stabilization Process
For a small multi-rotor (UAV), a video camera is usually
installed without any stabilization device. Unstable movement
due to UAV’s flying natures often occurs in the video images.
To reduce this uneasy viewing effect, the recorded images are
digitally processed as follows.
1) Frame extraction: It extracts a motion video into image
frames, where each frame is further processed.
2) Motion detection: It estimates an image motion using the
SURF method. Figure 11 illustrates feature detection,
extraction, and matching processes. Here, the keypoint
examples are indicated by green circles. There are hun-
dreds of keypoints detected in this frame, however, only
20 important keypoints are shown for the extraction and
matching processes.
3) Motion compensation: An image frame is compensated
in an opposite direction to offset camera motion. This
compensation includes scale and rotation transforma-
tions with a proper adjustment.
Fig. 11: Keypoint examples of image frame (top), and feature
extraction and matching between adjacent frames (bottom).
4) Image composition: A compensated image is adjusted
such that all detected keypoints between adjacent frames
are aligned. Illustration of motion compensation and
image composition is shown in Figure 12.
Fig. 12: Motion compensation and image composition.
5) Image Stabilization: A final image of the current frame
has all detected keypoints located as close as to those
from a previous frame.
Figure 13 summarizes the video stabilization process. As a
result, a final image of each frame has fewer movement and is
easier for viewing than a original version. All stabilized image
frames are reconstructed in order to have a smoother video.
C. Performance Measures
Since the visual quality of digitally stabilized video images
can rather be subjective, it is necessary to establish quantitative
measures to compare the effects of various video enhancement
algorithms on the quality of each image frame. Two commonly
used measures are the Mean-Squared Error (MSE) and Peak
Signal-to-Noise Ratio (PSNR). The MSE value represents a
cumulative squared error between the original image frames
and the stabilized version, whereas the PSNR value represents
a measure of the peak error. A lower error of the stabilized
algorithm results in a lower MSE value. A higher PSNR value
between any two stabilized adjacent frames indicates a good
quality of stabilized video.
Fig. 13: Video stabilization process.
The MSE and PSNR values are computed by:
MSE(n) = 1
MN
M
∑
y=1
N
∑
x=1
[In(x, y)−In+1 (x, y)]2,(11)
PSNR(n) = 10 log10 (Imax
MSE(n)),(12)
where Mand Nare the number of column and row pixels
of an image (frame dimension). Intensity values at the (x, y)
pixel location of the current nand the next n+ 1 frames are
defined as In(x, y)and In+1 (x, y). Typically, the intensity has
value between 0−255.Imax is the maximum pixel intensity
of the current frame. The PSNR is measured in decibels (dB).
Fig. 14: Quality of stabilized video and PSNR value.
This experiment compares a compensated frame, which
having undesired motion been removed, against the reference
image from a previous frame. Here, a high PSNR value
indicates both the compensated and reference image frames
have similar quality. In Figure 14, two adjacent frames having
all keypoint locations in a close vicinity can be combined into
an easy viewing video.
IV. RES ULT S AN D DISCUSSION
A. Experimental Results
There are 4 video scenarios in this analysis and each video
has a total of 700 image frames. The 1st video is recorded by a
multi-rotor’s digital camera flying in front of the DTI building.
This video has significant motion due to flying in a high wind
environment. However, it yields a good video sample since
all keypoint locations are well-spread throughout an image
frame. Similarly, the 2nd video is still recorded under a high
wind condition but at a later time. It appears to have almost
50%shade occurred on the image. The 3rd video is an internet
sample and is recorded from another multi-rotor camera. There
are more vibration in this video because the camera is set at a
low frame rate without using a stabilizing device. This video
is a good reference for a low quality camera. The 4th video is
recorded before a multi-rotor is taken off. Therefore, its image
does not have significant vibration effects.
Fig. 15: PSNR value at each frame of the 1st video sample.
In Figure 15, the PSNR values of both original and sta-
bilized versions are high because of a higher frame rate. The
keypoints in each frame are located closely to one another with
minimal differences. Therefore, the image frame of a stabilized
version has similar quality to those of original video samples.
However, higher PSNR values of stabilized samples indicate
that the SURF algorithm yields a smoother video. The PSNR
values of both original and stabilized samples of the 2nd video
shown in Figure 16 are still high. Though there are shadows
on the image, the SURF algorithm reduces their effects and
gives a cleaner video as evident by higher PSNR values than
those from a original version.
Fig. 16: PSNR value at each frame of the 2nd video sample.
In Figure 17, there are more changes of the PSNR values
due to a lower camera’s quality and a less control of the multi-
rotor to maintain a steady position. However, the stabilized
version still yields a higher PSNR value at each frame resulting
in an easier viewing video. The PSNR values of both original
and stabilized samples as shown in Figure 18 are relatively
equal because the multi-rotor has not yet flown. Their visual
qualities are thus similar.
Fig. 17: PSNR value at each frame of the 3rd video sample.
Fig. 18: PSNR value at each frame of the 4th video sample.
A comparison of average PSNR values between the original
and stabilized video samples is summarized in Table I. The
1st video yields a higher improvement percentage than the
2nd sample because the former does not have shadows on
the image. The stabilization process is able to match more
keypoints with the reference frame. However, a stabilization
performance of the 2nd scenario is relatively at the same level
of the 1st video. The 1st, 2nd, and 3rd videos quantitatively
show small improvement percentages, but their visual qualities
appear smoother for viewing. The 4th video yields an insignif-
icant improvement percentage since the multi-rotor has not yet
flown and the recorded video is affected by little movement.
TABLE I: Average PSNR Value of Each Video Samples
Scenario Original Stabilized Improvement
1st 70.7489 74.6266 5.48%
2nd 72.7996 76.1505 4.60%
3rd 72.3716 75.4981 4.32%
4th 86.8969 87.5343 0.73%
B. Discussion
All graphs in Figures 15 to 18 show that PSNR values
of the stabilized videos are higher at the beginning. Most
keypoints have not drastically changed their locations. Since
the multi-rotor has not taken off, there are little vibration
to affect the recorded video. However during its flight, the
video samples are shanking due to engine and movement
of the multi-rotor as well as environmental winds. These
disturbances directly contribute to different PSNR values of
the stabilized and original video. All stabilized versions have
closer keypoint locations between adjacent frames than those
keypoint locations from the original video. Therefore, the
stabilized videos using the SURF algorithm appear smoother
and easier for viewing.
In a certain situation where there exists abrupt changes to
attitude, height, and movement of a multi-rotor, the process
to match keypoints of image frames is more difficult than a
normal condition. The processed video seems more stabilized
but it can have some scratchy portions. Nonetheless, PSNR
values of the stabilized version are still higher than those of
the original sample. It is evident that this SURF algorithm can
be used to reduce video motions in order to improve visual
quality of a recorded video from a UAV camera.
V. CONCLUSION
The SURF algorithm detects keypoints of each image frame
and determines the location where a color intensity has a
maximum value among all points around its neighborhood
area. The experiments in this paper utilize 4 video samples
recorded from a fixed camera on a small multi-rotor to verify
the method’s efficiency. These videos include conditions under
normal mid-day light and having partial shadows on recorded
images. In fact, the SURF algorithm detects keypoints of
each image frame and make a location comparison between
the current and previous frames. The matched keypoints are
transformed by compensating movement of the current frame
such that these keypoints are located as close as possible to
those of the previous frame. These compensated image frame
are combined into a more stabilized video.
Several methods such as projective, similarity, and affine
transformations can be applied to achieve this compensation.
Each transformation is suitable to compensate keypoints with
different characteristics. A quantitative criteria for selecting an
appropriate transformation can be explored for a future study.
A combination among these methods can be implemented
instead of a single transformation so that the processed video
becomes smoother and more stabilized for viewing.
REF ER EN CE S
[1] P. Rawat and J. Singhai, “Review of Motion Estima-
tion and Video Stabilization Techniques for Hand Held
Mobile Video,” Signal and Image Process.: An Int. J.,
Vol. 2(2), pp. 159-168, Jun. 2011.
[2] S. Navayot and N. Homsup, “Real-Time Video Stabiliza-
tion for Aerial Mobile Multimedia Communication,” Int.
J. of Innovation, Management and Technology, Vol. 4(1),
pp. 26-30, Feb. 2013.
[3] M. Liebling. (2010). PoorMan3DReg [Online]. Available:
http://sybil.ece.ucsb.edu
[4] A. Neumann, H. Freimark and A. Wehrle. (2010). Geo-
data and Spatial Relation [Online]. Available: https://
geodata.ethz.ch
[5] P. Rawat and J. Singhai, “Efficient Video Stabilization
Technique for Hand Held Mobile Videos,” Int. J. of Sig-
nal Process., Image Process. and Pattern Recognition,
Vol. 6(3), pp. 17-32, Jun. 2013.
[6] M. Niskanen, O. Silven and M. Tico, “Video Stabilization
Performance Assessment,” Proc. of IEEE Int. Conf. on
Multimedia and Expo, Canada, Jul. 2006, pp. 405-408.
[7] C. Macmanus. (2009). The Technology Behind Sony Al-
pha DSLR’s Steady Shot Inside [Online]. Available:
http://www.sonyinsider.com
[8] D. G. Lowe, “Object Recognition from Local Scale-
Invariant Features,” Proc. of IEEE Int. Conf. on Computer
Vision, Kerkyra, Greece, Sep. 1999, pp. 1150-1157.
[9] D. G. Lowe, “Distinctive Image Features from Scale-
Invariant Keypoints,” Int. J. of Computer Vision, Vol. 2,
pp. 91-110, Nov. 2004.
[10] G. Yu and J. M. Morel, “ASIFT: An Algorithm for Fully
Affine Invariant Comparison,” Image Process. On Line,
Feb. 2011, pp. 1-28.
[11] Y. Ke and R. Sukthankar, “PCA-SIFT: A More Distinc-
tive Representation for Local Image Descriptors,” Proc.
of IEEE Comput. Soc. Conf. on Computer Vision and
Pattern Recognition, USA, Jun. 2004(2), pp. 506-513.
[12] X. Zheng, C. Shaohui, W. Gang and L. Jinlun, “Video
Stabilization System Based on Speeded-up Robust Fea-
tures,” Proc. of Int. Ind. Informatics and Computer Eng.
Conf., Xian, China, Jan. 2015, pp. 1995-1998.
[13] H. Bay, T. Tuytelaars, and L. V. Gool, “SURF: Speeded
Up Robust Features”, J. of Computer Vision and Image
Understanding, Vol. 110, Issue 3, pp. 346-359, Jun. 2008.
[14] J. T. Pederson. SURF: Feature Detection & Description,
Dept. of Computer Sci., Aarhus Univ., Denmark, 2011.
[15] S. M. Jurgensen, “The rotated Seepeded-Up Robust Fea-
tures algorithm (R-SURF),” M.S. thesis, Dept. Elec. and
Comp. Eng., Naval Postgraduate School, CA, 2014.
[16] Dhara Patel, Dixesh Patel, D. Bhatt and K. R. Jadav,
“Motion Compensation for Hand Held Camera Device,”
Int. J. of Research in Engineering and Technology, Vol. 4,
pp. 771-775, Feb. 2015.
[17] J. Y. Kim and C. H. Caldas, “Exploring Local Feature
Descriptors for Construction Site Video Stabilization,”
Proc. of 31st Int. Symp. on Automation and Robotics in
Construction and Mining, Sydney, Australia, Jul. 2014,
pp. 654-660.
[18] H. M. Sergieh, E. E. Zsigmond, M. Doller, D. Coquil,
J.M. Pinon and H. Kosch, “Improving SURF Image
Matching Using Supervised Learning,” Proc. of 8th Int.
Conf. on Signal Image Technology and Internet Based
Systems, Sorrento, Italy, Nov. 2012, pp. 230-237.
[19] S. Namarateruangsuk, “Image Filtering using Raised
Cosine-Blur,” SDU Research J. of Sci. and Technology,
Vol. 7(2), pp. 23-32, May 2014.
[20] A. Walhaa, A. Walia, and A. M. Alimia. “Video Stabi-
lization for Aerial Video Surveillance,” AASRI Conf. on
Intell. Syst. and Control, Vol. 4, pp. 72-77, 2013.