Hilbert-Huang Transform-based Local Regions Descriptors.
ABSTRACT This paper presents a new interest local regions descriptors method based on Hilbert-Huang Transform. The neighborhood of the interest local region is decomposed adaptively into oscillatory components called intrinsic mode functions (IMFs). Then the Hilbert transform is applied to each component and get the phase and amplitude information. The proposed descriptors sam- ples the phase angles information and amalgamates them into 10 overlap squares with 8-bin orientation histograms. The experiments show that the proposed descriptors are better than SIFT and other standard descriptors. Es- sentially, the Hilbert-Huang Transform based descriptors can belong to the class of phase-based descriptors. So it can provides a better way to overcome the illumination changes. Additionally, the Hilbert-Huang transform is a new tool for analyzing signals and the proposed descriptors is a new attempt to the Hilbert-Huang transform.
- SourceAvailable from: vis.uky.edu[show abstract] [hide abstract]
ABSTRACT: Invariant regions' are self-adaptive image patches that automatically deform with changing viewpoint as to keep on covering identical physical parts of a scene. Such regions can be extracted directly from a single image. They are then described by a set of invariant features, which makes it relatively easy to match them between views, even under wide baseline conditions. In this contribution, two methods to extract invariant regions are presented. The first one starts from corners and uses the nearby edges, while the second one is purely intensity-based. As a matter of fact, the goal is to build an opportunistic system that exploits several types of invariant regions as it sees fit. This yields more correspondences and a system that can deal with a wider range of images. To increase the robustness of the system, two semi-local constraints on combinations of region correspondences are derived (one geometric, the other photometric). They allow to test the consistency of correspondences and hence to reject falsely matched regions. Experiments on images of real-world scenes taken from substantially different viewpoints demonstrate the feasibility of the approach.International Journal of Computer Vision 01/2004; 59:61-85. · 3.62 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: The paper gives a snapshot of the state of the art in affine covariant region detectors, and compares their performance on a set of test images under varying imaging conditions. Six types of detectors are included: detectors based on affine normalization around Harris (Mikolajczyk and Schmid, 2002; Schaffalitzky and Zisserman, 2002) and Hessian points (Mikolajczyk and Schmid, 2002), a detector of ‘maximally stable extremal regions', proposed by Matas et al.(2002); an edge-based region detector (Tuytelaars and VanGool, 1999) and a detector based on intensity extrema (Tuytelaars and VanGool, 2000), and a detector of ‘salient regions', proposed by Kadir, Zisserman and Brady(2004). The performance is measured against changes in viewpoint, scale, illumination, defocus and image compression. The objective of this paper is also to establish a reference test set of images and performance software, so that future detectors can be evaluated in the same framework.International Journal of Computer Vision 01/2005; 65:43-72. · 3.62 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: The wide-baseline stereo problem, i.e. the problem of establishing correspondences between a pair of images taken from different viewpoints is studied.A new set of image elements that are put into correspondence, the so called extremal regions, is introduced. Extremal regions possess highly desirable properties: the set is closed under (1) continuous (and thus projective) transformation of image coordinates and (2) monotonic transformation of image intensities. An efficient (near linear complexity) and practically fast detection algorithm (near frame rate) is presented for an affinely invariant stable subset of extremal regions, the maximally stable extremal regions (MSER).A new robust similarity measure for establishing tentative correspondences is proposed. The robustness ensures that invariants from multiple measurement regions (regions obtained by invariant constructions from extremal regions), some that are significantly larger (and hence discriminative) than the MSERs, may be used to establish tentative correspondences.The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes. Significant change of scale (3.5×), illumination conditions, out-of-plane rotation, occlusion, locally anisotropic scale change and 3D translation of the viewpoint are all present in the test problems. Good estimates of epipolar geometry (average distance from corresponding points to the epipolar line below 0.09 of the inter-pixel distance) are obtained.Image and Vision Computing. 01/2004;
Hilbert-Huang Transform-based Local Regions
Dongfeng Han, Wenhui Li, Wu Guo
Computer Science and Technology,
Key Laboratory of Symbol Computation and
Knowledge Engineering of the Ministry of Education,
Jilin University, Changchun, P. R. China
School of Engineering Technology,
Shandong University of Technology, Zibo, P. R. China
This paper presents a new interest local regions descriptors method based
on Hilbert-Huang Transform. The neighborhood of the interest local region
is decomposed adaptively into oscillatory components called intrinsic mode
functions (IMFs). Then the Hilbert transform is applied to each component
and get the phase and amplitude information. The proposed descriptors sam-
ples the phase angles information and amalgamates them into 10 overlap
squares with 8-bin orientation histograms. The experiments show that the
proposed descriptors are better than SIFT and other standard descriptors. Es-
sentially, the Hilbert-Huang Transform based descriptors can belong to the
class of phase-based descriptors. So it can provides a better way to overcome
the illumination changes. Additionally, the Hilbert-Huang transform is a new
tool for analyzing signals and the proposed descriptors is a new attempt to the
Efficient local region descriptors are very useful in computer vision applications such as
matching, indexing, retrieval and recognition. Commonly, the procedure of correspon-
dences problem is not complex and includes several stages as shown in Fig.1: (1) detect
the interest local regions; (2) normalize the local regions size and the main orientation;
(3) compute their descriptors; (4) match the local regions using certain similar matching
The object of local region detection algorithm is to specify the locations and the scales
of the features. The detectors should be invariant to translation, rotation, scale, affine
transform. Many scale/affine local regions detectors are proposed in the past few years [1,
2, 3, 4, 5, 6, 7]. Given invariant region detectors, the remaining question is which are the
BMVC 2007 doi:10.5244/C.21.16
Figure 1: The main steps for the correspondences problem.
most appropriate descriptors to characterize the local regions. Many different methods for
describing local interest regions have been developed [9, 10, 11, 12, 13, 14, 15, 3, 16, 8].
In this paper, a new signal process tool Hilbert-Huang transform is used to describe
the local regions. The local regions are decomposed by bidimensional empirical mode
decomposition (BEMD) and several intrinsic mode functions (IMFs) and the residual part
are obtained. Then, Hilbert spectral analysis are conducted to describe the local regions.
The Hilbert-Huang transform is locally adaptive and suitable for analysis of non-linear
or non-stationary signals. Because the local regions we would process always occurs in
the non-stationary regions, it would be very suitable to describe the local regions using
Hilbert-Huang transform. At the same time, it shares the good features with wavelet and
Fourier analyses. Also, studies in  shows that it can provide much better temporal and
frequency resolutions than wavelet and Fourier analyses.
The experiments on standard data set show that the proposed descriptors have better
results for the image illumination changes and geometry transforms. Additionally, the
proposed algorithm is a new attempt to the Hilbert-Huang transform.
The rest of this paper is organized as follows. In section 2, we discuss some issues
about Hilbert-Huang transform. In section 3, the Hilbert-Huang transform based local
regions descriptors are described in detail. In section 4, the experiments and comparisons
are given. The results of the real image experiments are demonstrated. The empirical
comparisons among our approach and other five standard methods are made. The conclu-
sions and future work are discussed in section 5.
2 Hilbert-Huang Transform
Recently, Huang et al.  have introduced the empirical mode decomposition (EMD)
method for analyzing data. Empirical mode decomposition (EMD) is a general nonlin-
ear, non-stationary signal processing method. So it is very suitable for describing local
structure or local texture. EMD has been used in many fields such as in [18, 19, 20]. The
major advantage of the EMD is that the basis functions are derived from the signal itself.
Hence, the analysis is adaptive, in contrast to the wavelet method where the basis func-
tions are fixed. The central idea of EMD is to decompose a time series into a finite and
often small number of intrinsic mode functions (IMFs). As discussed in , an intrinsic
mode function (IMF) should satisfy two conditions: (1) in the whole data set, the number
of extrema and the number of zero crossings must either equal or differ at most by one;
(2) at any point, the mean value of the envelope defined by the local maxima and the
envelope defined by the local minima is zero. It is emphasized by Huang that the second
condition is very important to EMD, especially for non-stationary signal (such as interest
local regions). Paper  has a more insight on EMD.
Given a signal x(t), the EMD can be represented as:
where N is the number of IMFs and r(t) is the residue of the signal.
Hilbert-Huang transform has the advantages in analysing the local structure. Hilbert-
Huang transform includes two parts: 1) EMD; 2) Hilbert spectrum analysis (HSA). After
doing EMD on the signal, some IMFs can be obtained. The next step is to analyse the
IMFs using HSA. The main idea of HSA is to construct complex signal for the analyzed
real signal. The positive part of the spectrum of the initial real signal x(t) is multiplied
by two and the negative part is set to zero. Such spectrum corresponds to complex signal
z(t) whose imaginary part is equal to the Hilbert transform of the real part that equals to
x(t) as equations (2)and (3).
z(t) = x(t)+iy(t),
y(t) = H(x(t)) = v.p.
where p indicates the Cauchy principal value.
Though the above definitions are defined for any function x(t) which satisfies exis-
tence conditions for the integral (3), the physical meaning of parameters phase and ampli-
tude information is obvious only if x(t) belongs to the class of monocomponent functions,
i.e. the number of its extremes and the number of zero-crossings differ at most by 1 and
the mean between the upper and lower envelopes equals to zero. In practice, most of the
image signals are not monocomponent. By the definition of EMD in section2, the con-
ditions are satisfied after empirical mode decomposition. EMD is very suitable for the
Hilbert (or Riesz in multidimensional case) analysis.
For bidimensional signals, a similar algorithm called BEMD can be used to analysis
the image signal. The principle is similar with EMD. The BEMD procedure is shown in
Table1. In Fig.2, a two-level BEMD performed on a face image is shown. From Fig.2, it
is clear that IMF1and IMF2have much useful information. The BEMD indeed provides
the multi-scale representation of the local regions. It is shown  that the extracted local
features have direct semantic interpretation. It contains the pattern structures from the
finest to the coarsest. So the descriptors can be extracted by Hilbert analysis from IMF1,
IMF2and residue part.
Image is two dimensional signal and the Riesz transform  is a multidimensional
generalization of the Hilbert transform. Because I(x,y) = ∑n
dimensional complex signal can be expressed as,
IMFtA(x,y) = IMFt(x,y)+iIMFtH(x,y),t = 1,2,...n,
Table 1: The BEMD algorithm
(1) Initialization, Ir0(x,y) = I(x,y),i = 1;
(2) Extracting ith IMFs:
1) Let h0= Iri(x,y),k = 1;
2) For hk+1(k), extracting local maxima (mmax,k−1) and minima (mmin,k−1);
3) Creating upper and lower envelope by spline interpolation of the local maxima
(mmax,k−1) and minima (mmin,k−1);
4) Computing mean value of the envelope mk−1(x,y);
5) Computing hmean,k(x,y) = hk−1(x,y)−mk−1(x,y);
6) Checking if mean signal is close enough to zero. Yes: hk(x,y) = IMFi(x,y),
otherwise go to 2), and set k = k+1;
(3) Iri(x,y) = Iri−1(x,y)−IMFi(x,y);
(4) If the extreme > 2 in Iri(x,y), then go to (2), and set i = i+1, otherwise finish. The final result:
Figure 2: An example of BEMD. (a) The original image. (b) IMF1after BEMD. (d) IMF2
after BEMD. (b) The residual image after BEMD.
Figure 3: The illustration of overlap division for a normalized region. The local region
is divided into 10 overlap regions. For each square the descriptors are computed. The
formulation of division method is different from SIFT and other non-overlap methods.
where IMFt(x,y) and IMFtH(x,y) are called the real and complex parts of ItA(x,y).
So we can get the amplitude and the phase as equations (5) and (6).
θ(x,y) = arctanIMFtH(x,y)
It can be denoted in a compact expression,
G(x,y) = A(x,y).exp(iθ(x,y)).
In order not to lose information, the residual part as equation(8) can be analyzed by
the same way.
IrnA(x,y) = Irn(x,y)+iIrnH(x,y).
Commonly, nature images always have complex texture structure and the local regions
we are interested always occur in these regions. Similar with 1D signal, image signal is
suitably analyzed by Hilbert-Huang transform.
3Hilbert-Huang Transform Based Local Region
Though the phase contains more information than amplitude, the phase is sensitive to the
image transforms. In order to overcome this disadvantage, we project the phase onto eight
orientations for each pixel as in SIFT .
DOG detector is used to detect local regions over scales. The characteristic scale de-
and then is normalized to 41×41. The local region is performed by two levels BEMD
which can describe the detail local structure enough. Together with the residual image,
the proposed algorithm will get 3 ”images”(IMF1, IMF2, and Irn).
In order to reduce the feature dimensions and maintain the spatial information , the
local region is spatially divided into 10 overlap square regions. This division method can
Figure 4: Some test images. (a) graf image.(b) boat image. (c) car image. (d) motorcycle
get better result than the non-overlap in . The division scheme can provide high over-
lap ratio with enough descriptors dimensions. In Fig.3, a division example is illustrated.
In each square region, the phase angle θ(x,y))is projected onto eight orientations with
the amplitude A(x,y) at each position. In order to restrain the sensitivity to illumination
changes, the amplitude is controlled using equation(9) as in . By this way, the satu-
rated amplitude is roughly constant for large amplitude. With the eight orientations, 10
squares and 3 images(IMF1, IMF2, and Irn), the total dimensions of the descriptors are
?A(x,y) = 1−exp−A2(x,y)
In practice, it is not necessary to conduct BEMD for every local region. In fact, the
BEMD can be performed on the whole image at the beginning of the algorithm. This tip
can speed up the running time.
4 Experiments and Comparisons
4.1Matching Strategy and Comparison Criterion
Matching method is important to the final performance. For different problems, different
matching methods should be used. Nearest neighbour matching and ratio matching are
(1) Nearest neighbour matching: A and B are matched if the descriptor DBis the nearest
neighbor to DAand if the distance between them is below a threshold. With this approach
a descriptor has only one match.
(2) Ratio matching: this method is similar to nearest neighbor matching except that the
thresholding is applied to the distance ratio between the first and the second nearest neigh-
bour. Thus the regions are matched if ?DA−DB?/?DA−DC? < t where DBis the first
and DCis the second nearest neighbour to DA.
In this paper the ration matching method is used. We use recall and precision as
the performance evaluation metric. They are defined as, recall =
Two points viand vjare a pair of correct match if the error in relative location is
less than 3 pixels, which means viand vjshould satisfy Lvi−HLvj< 3, Where H is the
homography between viand vj.
0.30.35 0.40.45 0.5
00.1 0.2 0.30.40.50.60.70.8
Figure 5: Performances evaluation for several image changes. (a)ROC curve under affine
changes for graf image. (b) ROC curve under scale + rotation changes for boat image. (c)
ROC curve under illumination changes for car image. (d) ROC curve under blur changes
for motorcycle image.
The data set is available from VGG Lab . Some of the images are shown in Fig.4. We
compare six different descriptors which include: 1) the proposed algorithm; 2) SIFT ;
3) cross correlation; 4) PCA-SIFT ; 5) steerable filters ; 6) moment invariant .
We generate the recall vs precision graph by changing the threshold for six different
In Fig.5, the ROC curves are demonstrated for different image changes which include
affine changes(graf image), scale+rotation(boat image), illumination changes(car image)
and blur changes(motorcycle image). In our experiments, the proposed descriptors show
its superior for illumination changes and affine transform. The reason is that the proposed
descriptors is a phased-based descriptors essentially. For the illumination changes, the
phase information (corresponding to the orientation) is more robust than other methods.
In Fig.6, a matching example is given for view point change. We compare the pro-
posed descriptors with SIFT. It can be seen from the example that the proposed algorithm
can match more accurately than SIFT.
In this paper, a new local region descriptors is presented. The main contribute of this pa-
per is introducing Hilbert-Huang transform to local descriptors. We elaborately design an
efficient compact local descriptors based on the BEMD and Hilbert transform. The main
idea of the proposed descriptors is to analyze the IMFs by Hilbert spectrum. The corre-
Figure 6: Matching comparison proposed method with SIFT. The mismatches are drawn
in white lines with red cross. (a) The mismatches obtained by standard SIFT. SIFT: 13/50
mismatches. (b) The mismatches obtained by the proposed algorithm. Proposed Method:
sponding experiments are promising. The new descriptors show its advantages especially
for illumination changes and common geometry transforms.
 T. Lindeberg.: Feature detection with automatic scale selection. International Jour-
nal of Computer Vision, Vol. 30, 1998), No. 2, pp. 79–116.
 C. S. K. Mikolajczyk.: Indexing based on scale invariant interest points. 8th In-
ternational Conference on Computer Vision, Institute of Electrical and Electronics
Engineers Inc, 2001, pp. 525–531.
 D. G. Lowe.: Distinctive image features from scale-invariant keypoints. Interna-
tional Journal of Computer Vision, Vol. 60, 2004, No. 2, pp. 91–110.
 T. Tuytelaars. and L. V. Gool.: Wide baseline stereo matching based on local,
affinely invariant regions. The Eleventh British Machine Vision Conference, 2000,
 T. Tuytelaars and L. Van Gool.: Matching widely separated views based on affine
invariant regions. International Journal of Computer Vision, Vol. 59, 2004, No. 1,
 O. C. J. Matas, M. Urban and T. Pajdla.: Robust wide baseline stereo from max-
imally stable extremal regions. 13th British Machine Vision Conference, 2002,
 K. Mikolajczyk and C. Schmid.: An affine invariant interest point detector. Com-
puter Vison - Eccv 2002, 2002, pp. 128–142.
 K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky,
T. Kadir and L. van Gool.: A comparison of affine region detectors. International
Journal of Computer Vision, Vol. 65, 2005, No. 1-2, pp. 43–72.
 L.J.V.Gool, T.MoonsandD.Ungureanu.: Affine/photometricinvariantsforplanar
intensity patterns. 4th European Conference on Computer Vision, Vol. 1, Springer-
Verlag, 1996, pp. 642-651.
 F. Mindru, T. Tuytelaars, L. Van Gool and T. Moons.: Moment invariants for recog-
nition under changing viewpoint and illumination. Computer Vision and Image Un-
derstanding, 2004, No. 1-3, pp. -3-27.
 W. T. Freeman and E. H. Adelson.: The design and use of steerable filters. Ieee
Transactions on Pattern Analysis and Machine Intelligence, Vol. 13, 1991, No. 9,
 B. t. H. R. L.M.T. Florack, J.J Koenderink, and M.A. Viergever.: General intensity
transformations and differential invariants. Journal of Mathematical Imaging and
Vision, Vol. 4, 1994, No. 2, pp. 171–187.
 A. Baumberg.: Reliable feature matching across widely separated views. CVPR
2000: IEEE Conference on Computer Vision and Pattern Recognition, Institute of
Electrical and Electronics Engineers Computer Society, Los Alamitos, CA, USA,
2000, pp. 774–781.
 F. Schaffalitzky and A. Zisserman.: Multi-view matching for unordered image sets,
or How do i organize my holiday snaps? Computer Vison - Eccv 2002, 2002,
 G. Carneiro and A. D. Jepson.: Multi-scale phase-based local features. 2003 IEEE
Computer Society Conference on Computer Vision and Pattern Recognition, Insti-
tute of Electrical and Electronics Engineers Computer Society, 2003, pp. I/736–
 Y. Ke and R. Sukthankar.: Pca-sift: A more distinctive representation for local image
descriptors. Proceedings of the 2004 IEEE Computer Society Conference on Com-
puter Vision and Pattern Recognition, CVPR 2004, Institute of Electrical and Elec-
tronics Engineers Computer Society, Piscataway, NJ 08855-1331, United States,
2004, pp. II506-II513.
 N. E. Huang, Z. Shen, S. R. Long, M. L. C. Wu, H. H. Shih, Q. N. Zheng, N. C.
Yen, C. C. Tung and H. H. Liu.: The empirical mode decomposition and the hilbert
spectrum for nonlinear and non-stationary time series analysis. Proceedings of the
Royal Society of London Series a-Mathematical Physical and Engineering Sciences,
Vol. 454, 1998, No. 1971, pp. 903–995.
 Z. Liu, H. Wang and S. Peng.: Texture segmentation using directional empirical
mode decomposition. 2004 International Conference on Image Processing, ICIP
2004, Institute of Electrical and Electronics Engineers Computer Society, Piscat-
away, NJ 08855-1331, United States 2004, pp. 279–282.
 P. Flandrin, G. Rilling and P. Goncalves.: Empirical mode decomposition as a filter
bank. Ieee Signal Processing Letters, Vol. 11, 2004, No. 2, pp. 112–114.
 G. Rilling, P. Flandrin and P. Goncalves.: Empirical mode decomposition, fractional
gaussian noise and hurst exponent estimation. 2005 IEEE International Conference
on Acoustics, Speech, and Signal Processing, ICASSP ’05, Institute of Electrical
and Electronics Engineers Inc., Piscataway, NJ 08855-1331, United States, 2005,
 J. C. Nunes, Y. Bouaoune, E. Delechelle. Texture analysis based on local analysis
of the Bidimensional Empirical Mode Decomposition. Machine Vision and Appli-
cation, 2005, No. 16, pp, 177–188.
 G. Carneiro and A. Jepson. Phase-based local features. In European Conference on
Computer Vision, Copenhagen, Denmark, May 2002.