Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper proposes a method for binary image retrieval, where the black-and-white image is represented by a novel feature named the adaptive hierarchical density histogram, which exploits the distribution of the image points on a two-dimensional area. This adaptive hierarchical decomposition technique employs the estimation of point density histograms of image regions, which are determined by a pyramidal grid that is recursively updated through the calculation of image geometric centroids. The extracted descriptor combines global and local properties and can be used in variant types of binary image databases. The validity of the introduced method, which demonstrates high accuracy, low computational cost and scalability, is both theoretically and experimentally shown, while comparison with several other prevailing approaches demonstrates its performance.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Bull's eye score(%) AHDH [46] 49.94 PHTs [66] 64.13 RCF [10] 67.40 GFD [68] 81.20 Zernike Moments [24] 80.20 Our Method 68.35 degree are returned. Then, we count the number of 42 samples that belong to the same group as the query sample, which is denoted by r . ...
... The value of r/ 21 is calculated as the retrieval rate of the query sample, and the average retrieval rate of 651 retrieval samples is taken as the bull's eye score of the whole dataset. We compare the proposed method with the other region-based shape description methods, including adaptive hierarchical density histogram (AHDH) [46] , generic Fourier descriptors (GFD) [68] , Zernike moments [24] , Radon composite features (RCF) [10] , and polar harmonic transforms (PHTs) [66] . Table 8 shows the comparison results of these methods on MPEG-7 CE-2 dataset. ...
... It can be seen from the results that our method achieves high retrieval accuracy, which indicates that our method can represent the internal information of a shape. The retrieval rate of our method is higher than that of AHDH [46] , PHTs [66] and RCF [10] methods. These three methods are specially used for logo or symbol recognition. ...
Article
Shape is an important visual characteristic in representing an object, and it is also an important part of human visual information. Shape recognition is an important research direction in pattern recognition and image understanding. However, it is a difficult problem to extract discriminative and robust shape descriptors in shape recognition. This is because there are very large deformations in the shape, such as geometric changes, intra-class variations, and nonlinear deformations. These factors directly influence the accuracy of shape recognition. To deal with the influence of these factors on the performance of shape recognition and enhance the accuracy of recognition, we present a novel shape description method called invariant multiscale triangle feature (IMTF) for robust shape recognition. This method uses two types of invariant triangle features to obtain the shape features of an object, and it can effectively combine the boundary and the interior characteristics of a shape, and hence can increase the distinguish ability of the shape. We conducted an extensive experimental analysis on some shape benchmarks. The results indicate that our method can achieve high recognition accuracy. The superiority of our method has been further demonstrated in comparison to the state-of-the-art shape descriptors.
... The feature vector based on shifted Legendre moment invariants InvLdnm is used to classify the two images databases and the recognition accuracy is compared with Krawtchouk moment invariants (KMI). The results of the classification using all features are presented in Table 2. Our image retrieval system proceeds the following steps: the orthogonal invariant moment InvLd of order p is computed as described in Eq. (27) and obtain the descriptor vector Similarly the query image Apple-1 from MPEG7-CE Shape is considered and is shown in Fig.3 (the first image). ...
... The retrieved top 10 results are presented in Fig.6. Now, we will compare ourimage retrivial method which based on orthogonal invariant moments InvLdnm with other well known methods that are also based on the shape descriptors as AHDH [27] and HOG [23] features. To assess the individual power of each of these characteristics, we have performed the tests of image retrieval with the above mentioned databases and the experimental results are presented in Table3 The results illustrated in the tables 2,3 and figures 5,6 and 7 show the effectiveness of our image Retrieval and classification systems based on the proposed orthogonal invariant moments and Radial Basis Functions Neural Networks (RBF). ...
... presents a vector descriptor of size 10 × 6 of the imageA calculated by relation(27) ...
Article
Full-text available
Since shape is one of the important low-level features of any Image Retrieval and any image classification system, we present in this paper a new set of orthogonal polynomials called shifted Legendre polynomials, this helps to build a set of orthogonal moments, which are invariant to translation, rotation and scale. We apply new image Retrieval and classification systems based on the proposed invariant moments and Radial Basis Functions Neural Networks. To show the effectiveness of our approaches we present some experimental results.
... 至此, 已给出了描述子 HCMD 的定义、 计算方法及距离度量, 下面进一步分析其性能. 由于在相关 工作中, 文献 [39] 提出的适应性分层密度直方图方法 (adaptive hierarchical density histogram, AHDH) 与本文工作最为接近, 因此在性能分析中, 将 HCMD 与 AHDH [39] 进行比较. 目标的旋转、缩放和平 移并不改变其形状, 由前面的形状签名的产生与不变性处理和距离度量可知, HCMD 已满足对这些几 何变换的不变性. ...
... 至此, 已给出了描述子 HCMD 的定义、 计算方法及距离度量, 下面进一步分析其性能. 由于在相关 工作中, 文献 [39] 提出的适应性分层密度直方图方法 (adaptive hierarchical density histogram, AHDH) 与本文工作最为接近, 因此在性能分析中, 将 HCMD 与 AHDH [39] 进行比较. 目标的旋转、缩放和平 移并不改变其形状, 由前面的形状签名的产生与不变性处理和距离度量可知, HCMD 已满足对这些几 何变换的不变性. ...
... 目标的旋转、缩放和平 移并不改变其形状, 由前面的形状签名的产生与不变性处理和距离度量可知, HCMD 已满足对这些几 何变换的不变性. 而 AHDH [39] 方法则依赖于图像的坐标系统, 不满足旋转不变性. HCMD 比 AHDH 更加紧致. ...
... More recently, the PatMedia image search engine was developed during the PATExpert project [19]. PatMedia is capable of retrieving patent images based on visual similarity using the Adaptive Hierarchical Density Histograms (AHDH) [20] and constitutes the retrieval engine of an integrated patent image extraction and retrieval framework [5]. ...
... Given the fact that general case image representation features are based on colour and texture, which are absent in patent images, we need to apply an algorithm which takes into account the geometry and the pixel distribution of these images. In this work we propose the application of the Adaptive Hierarchical Density Histograms (AHDH) as visual feature vectors, which have shown discriminative power between binary complex drawings [20]. The Adaptive Hierarchical Density Histograms (ADHD) are devised specifically to deal with such binary and complex images. ...
... around 100 features). Based on experiments conducted with patent datasets, the ADHD outperformed the other state of the art methods [20]. ...
Chapter
Nowadays most of the patent search systems still rely upon text to provide retrieval functionalities. Recently, the intellectual property and information retrieval communities have shown great interest in patent image retrieval, which could augment the current practices of patent search. In this chapter, we present a patent image extraction and retrieval framework, which deals with patent image extraction and multimodal (textual and visual) metadata generation from patent images with a view to provide content-based search and concept-based retrieval functionalities. Patent image extraction builds upon page orientation detection and segmentation, while metadata extraction from images is based on the generation of low level visual and textual features. The content-based retrieval functionality is based on visual low level features, which have been devised to deal with complex black and white drawings. Extraction of concepts builds upon on a supervised machine learning framework realised with Support Vector Machines and a combination of visual and textual features. We evaluate the different retrieval parts of the framework by using a dataset from the footwear and the lithography domain.
... The Adaptive Hierarchical Density Histograms (AHDH) 15 were devised specifically to deal with binary and complex images. The binary image is considered as a two dimensional plane in AHDH, and the hierarchical de-composing of image is acquired by calculating the hierarchical geometric centroids. ...
... Density features and quantized relative density features are considered for the construction of the feature vector. For levels l less than an experimentally defined level ld (ld =3), density features are estimated, while if l ≥ ld quantized relative density features are computed 15 . The features are estimated accordingly for all regions at each level and the overall feature vector is updated at each iteration of the algorithm. ...
Conference Paper
Full-text available
Automatic classification of digital patent images is significant for improving the efficiency of patent examination and management. In this paper, we propose a new patent image classification method based on an enhanced deep feature representation. Convolutional neural networks (CNN) is novelly applied to the patent image classification. The synergy between deep learning and traditional handcraft feature is explored. Specifically, the deep feature is first learned from massive patent image samples by AlexNet. Then such deep learning feature is further enhanced by fusing with two kinds of typical handcraft features including local binary pattern (LBP) and adaptive hierarchical density histogram (AHDH). In order to obtain a more compact feature representation, dimension of the fused feature is subsequently reduced by PCA. Finally, the patent image classification is conducted by a series of SVM classifier. Statistical test results on a large-scale image set show that the state-of-the-art performance is achieved by our proposed patent image classification method.
... Converting the text to metainformation. 3) Combine the spatial knowledge using a Quad tree spatial structure [1] for pooling. ...
... Table II shows the performance of our method compared to other methods in same paradigm. In the Table II Quad Tree method is an adpataion of [1]. The best results in each category is higlighted. ...
Conference Paper
Digital libraries store images which can be highly degraded and to index this kind of images we resort to word spotting as our information retrieval system. Information retrieval for handwritten document images is more challenging due to the difficulties in complex layout analysis, large variations of writing styles, and degradation or low quality of historical manuscripts. This paper presents a simple innovative learning-free method for word spotting from large scale historical documents combining Local Binary Pattern (LBP) and spatial sampling. This method offers three advantages: firstly, it operates in completely learning free paradigm which is very different from unsupervised learning methods, secondly, the computational time is significantly low because of the LBP features, which are very fast to compute, and thirdly, the method can be used in scenarios where annotations are not available. Finally, we compare the results of our proposed retrieval method with other methods in the literature and we obtain the best results in the learning free paradigm.
... Converting the text to metainformation. 3) Combine the spatial knowledge using a Quad tree spatial structure [1] for pooling. ...
... Table II shows the performance of our method compared to other methods in same paradigm. In the Table II Quad Tree method is an adpataion of [1]. The best results in each category is higlighted. ...
Article
Full-text available
Digital libraries store images which can be highly degraded and to index this kind of images we resort to word spot- ting as our information retrieval system. Information retrieval for handwritten document images is more challenging due to the difficulties in complex layout analysis, large variations of writing styles, and degradation or low quality of historical manuscripts. This paper presents a simple innovative learning-free method for word spotting from large scale historical documents combining Local Binary Pattern (LBP) and spatial sampling. This method offers three advantages: firstly, it operates in completely learning free paradigm which is very different from unsupervised learning methods, secondly, the computational time is significantly low because of the LBP features which are very fast to compute, and thirdly, the method can be used in scenarios where annotations are not available. Finally we compare the results of our proposed retrieval method with the other methods in the literature.
... distribution (distance and orientation) of landmark points around feature point, which is tolerant to deformations and be used in hand-written symbols. The Adaptive Hierarchical Density Histogram (AHDH) [3] method exploits the distribution of the image points in a two-dimensional area by employing the estimate of point density histograms in image regions. The Blurred Shape Model [4] descriptor defines a set of spatial regions by means of a grid, which has been designed to image recognition. ...
... We use the algorithm in section 4 for similarity measurement. Then we compare it with three recent methods: AHDH [3], SC [2] and TAR [5]. In this trademark database, we randomly select 10 images from each class (except the class 15) as queries to calculate the similarity values between it and every image of the database. ...
... Vrochidis et al. [25] created the adaptive hierarchical density histogram (AHDH) to exploit the content of patent images. This method has been extended by Sidiropoulous et al. [65], who introduced the quantized relative density information. Both studies calculated the distribution of the binary pixels in a patent image by performing image segmentation. ...
Article
Full-text available
The patent database is often used by designers to search for inspirational stimuli for innovative design opportunities because of the large size, extensive variety and the massive quantity of design information contained in patent documents. Growing work on design-by-analogy has adopted various vectorization approaches for associating design documents. However, they only focused on text analysis and ignored visual information. Research in engineering design and cognitive psychology has shown that visual stimuli may benefit design-by-analogy. In this study, we focus on visual design stimuli and automatically derive the vector space and the design feature vectors representing design images. The automatic vectorization approach uses a novel convolutional neural network architecture named Dual-VGG aiming to accomplish two tasks: visual material type prediction and international patent classification (IPC) section-label predictions. The derived feature vectors that embed both visual characteristics and technology-related knowledge can be potentially utilized to guide the retrieval and use of near-field and far-field design stimuli according to their vector distances. We report the accuracy of the training tasks and also use a case study to demonstrate the advantages of design image retrievals based on our model.
... While this approach is invariant to rigid body transformations, the size of the CDM is dependent on image resolution and the resulting processes are inefficient both computationally and memory-wise. The Adaptive Hierarchical Density Histogram (AHDH) method [22] along with the retrieval framework PATMEDIA [30] exploits both local and global content. It uses both content-based (i.e image-based) as well as concept-based (text-based) retrieval and claims joint retrieval using both text and image give better retrieval performance. ...
... features from image contours. The adaptive hierarchical density histogram (AHDH) method of [3] consists of an adaptive multi-resolution-multi feature representation with features from black and white pixel counts across the multi-resolution blocks. [4] fused a set of local handcrafted features including local binary patterns and edge histograms to train a linear SVM. ...
Conference Paper
Full-text available
Binary image based classification and retrieval of documents of an intellectual nature is a very challenging problem. Variations in the binary image generation mechanisms which are subject to the document artisan designer including drawing style, viewpoint , inclusion of multiple image components are plausible causes for increasing the complexity of the problem. In this work, we propose a method suitable to binary images which bridges some of the successes of deep learning (DL) to alleviate the problems introduced by the aforementioned variations. The method consists on extracting the shape of interest from the binary image and applying a non-Euclidean geometric neural-net architecture to learn the local and global spatial relationships of the shape. Empirical results show that our method is in some sense invariant to the image generation mechanism variations and achieves results outperforming existing methods in a patent image dataset benchmark.
... While this approach is invariant to rigid body transformations, the size of the CDM is dependent on image resolution and the resulting processes are inefficient both computationally and memory-wise. The Adaptive Hierarchical Density Histogram (AHDH) method [22] along with the retrieval framework PATMEDIA [30] exploits both local and global content. It uses both content-based (i.e image-based) as well as concept-based (text-based) retrieval and claims joint retrieval using both text and image give better retrieval performance. ...
Preprint
Full-text available
Resolution of the complex problem of image retrieval for diagram images has yet to be reached. Deep learning methods continue to excel in the fields of object detection and image classification applied to natural imagery. However, the application of such methodologies applied to binary imagery remains limited due to lack of crucial features such as textures,color and intensity information. This paper presents a deep learning based method for image-based search for binary patent images by taking advantage of existing large natural image repositories for image search and sketch-based methods (Sketches are not identical to diagrams, but they do share some characteristics; for example, both imagery types are gray scale (binary), composed of contours, and are lacking in texture). We begin by using deep learning to generate sketches from natural images for image retrieval and then train a second deep learning model on the sketches. We then use our small set of manually labeled patent diagram images via transfer learning to adapt the image search from sketches of natural images to diagrams. Our experiment results show the effectiveness of deep learning with transfer learning for detecting near-identical copies in patent images and querying similar images based on content.
... features from image contours. The adaptive hierarchical density histogram (AHDH) method of [3] consists of an adaptive multi-resolution-multi feature representation with features from black and white pixel counts across the multi-resolution blocks. [4] fused a set of local handcrafted features including local binary patterns and edge histograms to train a linear SVM. ...
Preprint
Full-text available
Binary image based classification and retrieval of documents of an intellectual nature is a very challeng-ing problem. Variations in the binary image generation mechanisms which are subject to the documentartisan designer including drawing style, view-point, inclusion of multiple image components are plausiblecauses for increasing the complexity of the problem. In this work, we propose a method suitable to binaryimages which bridges some of the successes of deep learning (DL) to alleviate the problems introducedby the aforementioned variations. The method consists on extracting the shape of interest from the bi-nary image and applying a non-Euclidean geometric neural-net architecture to learn the local and globalspatial relationships of the shape. Empirical results show that our method is in some sense invariantto the image generation mechanism variations and achieves results outperforming existing methods in a patent image dataset benchmark.
... Another time, this experience shows the superiority of the orthogonal invariant moments OIM (3) . Now, we will present a comparative study of our image retrieval method which is based on the proposed orthogonal invariant moments OIM (i) , i = 1, 2, 3, with other well-known methods that are also based on the shape descriptors as AHDH [16] and HOG [3]. The color descriptors as the descriptors computed in color spaces: RGB, HSV, YCrCb and HMMD [5] and the histogram color descriptors (Hist) [6] and the texture descriptors as the first order statistics descriptors (FOS) computed from the histogram of gray-scale images [7] and the descriptors of Gray-Level Dependency (SGLD) [15]. ...
Article
Full-text available
Due to the invariance to translation, rotation and scaling, the seven invariant moments presented by Hu (Visual pattern recognition by moment invariants, IRE Transactions on Information Theory, vol. 8, February 1962, pp. 179–187) are widely used in the field of pattern recognition. The set of these moments is finite; therefore, they do not comprise a complete set of image descriptors. To solve this problem, we introduce in this paper a new set of invariant moments of infinite order. The non-orthogonal property causes the redundancy of information. For this reason, we propose a new set of orthogonal polynomials in two variables, and we present a set of orthogonal moments, which are invariant to rotation, scale and translation. The presented approaches are tested by the invariability of the moments, the image retrieval and the classification of the objects. In this framework, using the proposed orthogonal moments, we present two classification systems. The first based on the Fuzzy C-Means Clustering algorithm (FCM) and the second based on the Radial Basis Functions Neural Network (RBF). The performance of our invariant moments is compared with Legendre invariant moments, Tchebichef-Krawtchouk (TKIM), Tchebichef-Hahn (THIM), Krawtchouk-Hahn (KHIM), Hu invariant moments, the descriptor of histogram of oriented gradients (HOG), the adaptive hierarchical density histogram features (AHDH) and with descriptors of color and texture Hist, HSV, FOS and SGLD. The experimental tests are performed on seven image databases: Columbia Object Image Library (COIL-20) database, MPEG7-CE shape database, MNIST handwritten digit database, MNIST fashion image database, ImageNet database, COIL-100 database and ORL database. The obtained results show the efficiency and superiority of our orthogonal invariant moments.
... The method of generating DI [18][19][20] mainly includes difference method, ratio method, log ratio method and mean ratio method, which are all based on the similarity measure. Then the black-and-white binary image [21][22] is generated by analyzing DI with threshold method, clustering method, graph cut method, level set method and so on. When the speckle noise [23] in DI is serious and the edge and local information are not clear, the obtained black-and-white binary image has poor stability and robustness, which seriously affects the final result of the change detection.Therefore, how to effectively suppress speckle noise is very important in the detection of high resolution SAR images. ...
Article
Full-text available
With the resolution increasing, the structure information becomes more and more abundant in Synthetic Aperture Radar (SAR) images. The speckle noise generated by the coherent imaging mechanism, has a great influence on the detection accuracy and detection difficulty accordingly in highresolution SAR change detection. In this paper, a multivariate change detection framework based on nonsubsampled contourlet transform (NSCT), deep belief networks (DBN), fuzzy c-means (FCM) clustering, and global-local spatial pyramid pooling (SPP) net is proposed. NSCT decomposes the image into multiple scales and DBN is used for extracting feature of the decomposed coefficient matrix. FCM converges the similarity matrix of the initial features by DBN into two classes as a pseudo-label for global-local SPP net training data. The global-local SPP net consists of a large-scale region of interest (ROI) SPP net and a small-scale change detection SPP net. The combination of ROI and the SPP net, as well as the fusion between different scales, weakens the interference of the unchanged information and effectively eliminates a large number of redundant information. The experimental results show that our proposed method can effectively remove speckle noise and improve the robustness of high-resolution SAR change detection.
... Density, compactness, rectangularity, and eccentricity are computed for each partition. The binary image retrieval methods developed by Sidiropoulos et al. [8] and Yang et al. [9] differ from Liu's et al. work because they consider a single direction (the most descriptive one). Also, they split the image region recursively regarding the four subregions for the next partition in each iteration instead of only one. ...
Preprint
Full-text available
In this paper, we present the Hierarchy-of-Visual-Words (HoVW), a novel trademark image retrieval (TIR) method that decomposes images into simpler geometric shapes and defines a descriptor for binary trademark image representation by encoding the hierarchical arrangement of component shapes. The proposed hierarchical organization of visual data stores each component shape as a visual word. It is capable of representing the geometry of individual elements and the topology of the trademark image, making the descriptor robust against linear as well as to some level of nonlinear transformation. Experiments show that HoVW outperforms previous TIR methods on the MPEG-7 CE-1 and MPEG-7 CE-2 image databases.
... Density, compactness, rectangularity, and eccentricity are computed for each partition. The binary image retrieval methods developed by Sidiropoulos et al. [8] and Yang et al. [9] differ from Liu's et al. work because they consider a single direction (the most descriptive one). Also, they split the image region recursively regarding the four subregions for the next partition in each iteration instead of only one. ...
Conference Paper
Full-text available
In this paper, we present the Hierarchy-of-Visual-Words (HoVW), a novel trademark image retrieval (TIR) method that decomposes images into simpler geometric shapes and defines a descriptor for binary trademark image representation by encoding the hierarchical arrangement of component shapes. The proposed hierarchical organization of visual data stores each component shape as a visual word. It is capable of representing the geometry of individual elements and the topology of the trademark image, making the descriptor robust against linear as well as to some level of nonlinear transformation. Experiments show that HoVW outperforms previous TIR methods on the MPEG-7 CE-1 and MPEG-7 CE-2 image databases.
... This method relies on an adaptive feature extraction technique [38] based on recursive subdivisions of the word images so that the resulting subimages at each iteration have balanced (approximately equal) numbers of foreground pixels, for two levels. This adaptive hierarchical decomposition technique determines the pyramidal grid which is recursively updated through the calculation of image geometric centroids [33]. ...
Article
Full-text available
Word spotting is an important recognition task in large-scale retrieval of document collections. In most of the cases, methods are developed and evaluated assuming perfect word segmentation. In this paper, we propose an experimental framework to quantify the goodness that word segmentation has on the performance achieved by word spotting methods in identical unbiased conditions. The framework consists of generating systematic distortions on segmentation and retrieving the original queries from the distorted dataset. We have tested our framework on several established and state-of-the-art methods using George Washington and Barcelona Marriage Datasets. The experiments done allow for an estimate of the end-to-end performance of word spotting methods.
... 4) Method based on Quad Tree: This method relises on a adaptive feature extraction technique [18] based on recursive subdivisions of the word images so that the resulting sub images at each iteration have balanced (approximately equal) numbers of foreground pixels, for two levels. This adaptive hierarchical decomposition technique which determines the pyramidal grid that is recursively updated through the calculation of image geometric centroids [19]. 5) Method based on Local Binary Pattern: In this method the apative hierarchical decomposition technique employs the pooling of the Local Binary Patterns in the adaptive regions, which are determined by a pyramidal grid that is recursively updated through the calculation of image geometric centroids to calculate the feature vector for matching. ...
Article
Full-text available
Word spotting is an important recognition task in historical document analysis. In most cases methods are developed and evaluated assuming perfect word segmentations. In this paper we propose an experimental framework to quantify the effect of goodness of word segmentation has on the performance achieved by word spotting methods in identical unbiased conditions. The framework consists of generating systematic distortions on segmentation and retrieving the original queries from the distorted dataset. We apply the framework on the George Washington and Barcelona Marriage Dataset and on several established and state-of-the-art methods. The experiments allow for an estimate of the end-to-end performance of word spotting methods.
... This technique uses the assessment of point density histograms of image regions that are computed by a pyramidal grid that is repeatedly simplified through the calculation of image geometric centroids. This extracted descriptor includes both global and local possessions that can be used in different types of binary image databases for the retrieval of images [6]. Improving the ranking quality of medical image retrieval using a genetic feature selection method: In this paper, the authors proposed a method for improving the ranking quality for medical image retrieval using genetic feature selection method. ...
Article
Full-text available
The problem of retrieval and managing of medical images has become more important due to its scalability and rich information contained in it. In day to day activities, as more as medical images were converted into digital form. Due to its nature, the need for efficient and simple access to this data is essential one. This paper proposed a novel approach namely "content based medical image retrieval using multilevel hybrid approach" to manage and retrieval of this data.This work has been implemented as four levels. In each level, the retrieval performance of the work has been improved. The result of this work has been compared with some of the existing works that are made as literature review on this work.
... For instance, Héroux et al. proposed in [24] a document descriptor that encodes in a hierarchical fashion pixel densities within a grid partition. Sidiropoulos et al. [38] proposed a similar descriptor that encodes the average of gray intensity over an adaptive grid. In [19], Gordo et al. proposed a document description based on multi-scale runlength histograms. ...
Article
Full-text available
In this paper, we present a page classification application in a banking workflow. The proposed architecture represents administrative document images by merging visual and textual descriptions. The visual description is based on a hierarchical representation of the pixel intensity distribution. The textual description uses latent semantic analysis to represent document content as a mixture of topics. Several off-the-shelf classifiers and different strategies for combining visual and textual cues have been evaluated. A final step uses an n-gram model of the page stream allowing a finer-grained classification of pages. The proposed method has been tested in a real large-scale environment and we report results on a dataset of 70,000 pages.
... Image retrieval system provides the most similar images from a given database when the query image is submitted. Many researchers deal with shape retrieval from binary image databases containing for example botanical collections [4,7,14,20,21], medical images [2,3], road signs [8], trademarks [12], or patent images [18]. This paper presents novel method for constructing affine invariant descriptors based on 2D Fourier transform. ...
Article
Full-text available
This paper presents a method for affine invariant recognition of two-dimensional binary objects based on 2D Fourier power spectrum. Such function is translation invariant and their moments of second order enable construction of affine invariant spectrum except of the rotation effect. Harmonic analysis of samples on circular paths generates Fourier coefficients whose absolute values are affine invariant descriptors. Affine invariancy is approximately saved also for large digital binary images as demonstrated in the experimental part. The proposed method is tested on artificial data set first and consequently on a large set of 2D binary digital images of tree leaves. High dimensionality of feature vectors is reduced via the kernel PCA technique with Gaussian kernel and the k-NN classifier is used for image classification. The results are summarized as k-NN classifier sensitivity after dimensionality reduction. The resulting descriptors after dimensionality reduction are able to distinguish real contours of tree leaves with acceptable classification error. The general methodology is directly applicable to any set of large binary images. All calculations were performed in the MATLAB environment.
... Given the fact that general case image representation features consider colour and texture, which are absent in most of patent images, it is evident that we need to apply an algorithm that takes into account the geometry and the pixel distribution of these images. To this end, we employ the Adaptive Hierarchical Density Histograms (AHDH) as visual feature vectors, due to the fact that they have shown discriminative power between binary complex drawings (Sidiropoulos et al., 2011). ...
Conference Paper
Full-text available
Patent images are very important for patent examiners to understand the contents of an invention. Therefore there is a need for automatic labelling of patent images in order to support patent search tasks. Towards this goal, recent research works propose classification-based approaches for patent image annotation. However, one of the main drawbacks of these methods is that they rely upon large annotated patent image datasets, which require substantial manual effort to be obtained. In this context, the proposed work performs extraction of concepts from patent images building upon a supervised machine learning framework, which is trained with limited annotated data and automatically generated synthetic data. The classification is realised with Random Forests (RF) and a combination of visual and textual features. First, we make use of RF’s implicit ability to detect outliers to rid our data of unnecessary noise. Then, we generate new synthetic data cases by means of Synthetic Minority Over-sampling Technique (SMOTE). We evaluate the different retrieval parts of the framework by using a dataset from the footwear domain. The results of the experiments indicate the benefits of using the proposed methodology.
Chapter
Identify documents, such as personal identity cards, driving licenses, passports, and residence permits, are indispensable and necessary in human life. Various industries, like hotel accommodation, banking service, traffic service and car renting, need identify documents to extract and verify the personal information. Automatic identity document classification can help people to quickly acquire the valid information from the documents, saving the labor cost and improving the efficiency. However, due to the inconsistency and diversity of identify documents between different countries, automatic classification of these documents is still a challenging problem. We propose an identity document classification method in this paper. GAT (Graph Attention Network) and its edge-based variant are exploited to generate the graph embedding, which fuse visual features and textual features together. Moreover, the key information is learnt to guide to a better representation of the document feature. Extensive experiments on the identify document dataset have been conducted to prove the effectiveness of the proposed method.
Conference Paper
An essential issue for the recognition of handwritten mathematical formulas is the identification of the structural relationships between each pairs of adjacent symbols that compose the entire mathematical formula. The classification of the structural relationship is a key problem as this classification often determines the semantic interpretation of an expression. In this work, we propose a system for the identification of spatial relationships based on geometric features and a new descriptor named spatial histogram. After the combination of extracted features, we classify the relationship into six different classes using four different classifiers in order to determine the most efficient. In our proposed system, a support vector machine (SVM) classifier, Random Forest, Adaboost and KNN are employed. Experimental results show that our features give promising results.
Chapter
Automatic recognition of handwritten mathematical expressions in Arabic is a difficult problem, even if all the symbols that compose the expression are recognized correctly. The classification of spatial relations between pairs of adjacent symbols is a key problem as this classification often determines the semantic interpretation of an expression. In this work, we propose a system for the identification of spatial relationships based on geometric features and a new descriptor named spatial histogram. After features extraction, two types of fusion are compared for the final classification which are: feature-level fusion and decision-level fusion. In our proposed system, a support vector machine (SVM) classifier is employed. Experimental results show that our features give promising results. Moreover, using the decision-level fusion improve the classification accuracy.
Chapter
Alzheimer is a prominent cause of death, ranked sixth in the list. Timely diagnosis of such abnormalities may help to temporarily minimize its worsening. Computer-aided techniques are adapted to the brain MR images for diagnosing and retrieval of Alzheimer disease images. An immense amount of work has been carried on the generic image retrieval systems using content-based information. These image retrieval schemes have their own merits and demerits in their retrieval performance. So, it is required to develop an efficient content-based image retrieval (CBIR) system in the medical field which is still a challenging task. Therefore, a hybrid features extraction technique has been proposed for CBIR wherein contrast feature, texture-based features and morphological operated features of brain MRI images are extracted and these features are hybridized by applying fusion technique for better Alzheimer disease detection. The proposed approach of feature extraction techniques is evaluated using support vector machine (SVM) and decision tree (DT) classification scheme for pattern learning and classification. Based on the results of feature extraction techniques, SVM and DT achieve an overall accuracy of 91.25 and 86.66% with better precision, recall, sensitivity and specificity.
Chapter
This chapter discusses both unsupervised and semi‐supervised methods to facilitate video understanding tasks. It considers two general research problems: video retrieval and video annotation/summarization. It covers video retrieval and annotation/summarization, two approaches to fight the consequences of the big video data. The chapter presents the state‐of‐the‐art research in each research field. Content‐based video hashing is a promising direction for content‐based video retrieval, but it is more challenging than image hashing. Graph‐based learning is an efficient approach for modeling data in various machine learning schemes that is unsupervised learning, supervised learning, and semi‐supervised learning. The chapter describes two different memory units in context memory model (CMM), and proposes the framework based on them. There are two different memory units in CMM: short‐term context memory and long‐term context memory units. The chapter also considers applying the video understanding methods to more real‐world applications, for example, surveillance, video advertising, and short video recommendations.
Article
Full-text available
Local features and descriptors that perform well in the case of photographic images are often unable to capture the content of binary technical drawings due to their different characteristics. Motivated by this, a new local feature representation, the contextual local primitives, is proposed in this paper. It is based on the detection of the junction and end points, classification of the local primitives to local primitive words and establishment of the geodesic connections of the local primitives. We exploit the granulometric information of the binary patent images to set all the necessary parameters of the involved mathematical morphology operators and window size for the local primitive extraction, which makes the whole framework parameter free. The contextual local primitives and, their spatial areas as a histogram weighting factor are evaluated by performing binary patent image retrieval experiments. It is found that the proposed contextual local primitives perform better than the local primitives only, the SIFT description of the contextual Hessian points, the SIFT description of local primitives and state of the art local content capturing methods. Moreover, an analysis of the approach in the perspective of a general patent image retrieval system reveals of its being efficient in multiple aspects.
Chapter
In this chapter, we will analyse the current technologies available that deal with graphical information in patent retrieval applications and, in particular, with the problem of recognising and understanding information carried by flowcharts. We will review some of the state-of-the-art techniques that have arisen from the graphics recognition community and their application in the intellectual property domain. We will present an overview of the different steps that compound a flowchart recognition system, looking also at the achievements and remaining challenges in such a domain.
Thesis
Full-text available
ABSTRACT Intishor, Ahsanul 2015. Face Recognition Using Centroid Method and Geometric Mean. Thesis. Department of Informatics. Faculty of Science and Technology. State Islamic University of Maulana Malik Ibrahim Malang. Advisors: (I) Dr. Cahyo Crysdian (II) Irwan Budi Santoso, M.Kom. Keywords: Face Recognition, Geometry, Reconstruction, Centroid, Geometric Mean. Facial recognition technology is increasingly used as one aspect of human biometrics in addition to fingerprints, DNA, voice and retina. The use of face recognition can be used as a security system that is more difficult to slip, because the identification process involves a unique method of identification, I, e. identification of facial geometry. This study seeks to provide groundbreaking face recognition using the centroid method and geometric mean for the category of frontal face. The advantage of centroid method the ease of calculating the value of the middle point is generated in an object, while the overall geometric mean calculates the average value of the object geometry. Results of the study showed that the application of the Face Recognition Method with Using Centroid and Geometric Mean 96% accuracy and time computed around 19,6 sec
Article
In this paper, we present a novel mathematical tool, Structure Integral Transform (SIT), for invariant shape description and recognition. Different from the Radon Transform, which integrates the shape image function over a 1D line in the image plane, the proposed SIT builds upon two orthogonal integrals over a 2D K-cross dissecting structure spanning across all rotation angles by which the shape regions are bisected in each integral. This Structure Integral Transform brings the following advantages over the Radon Transform: (1) It has the extra function of describing the interior structural relationship within the shape which provides a more powerful discriminative ability for shape recognition; (2) The shape regions are dissected by the K-cross in a coarse to fine hierarchical order that can characterize the shape in a better spatial organization scanning from the center to the periphery; and (3) It is easier to build a completely invariant shape descriptor. The experimental results of applying SIT to shape recognition demonstrate its superior performance over the well-known Radon Transform, and the well-known shape contexts and the Polar Harmonic Transforms.
Conference Paper
In this paper, we proposes a method which use locality-constrained linear coding (LLC) and spatial pyramid matching (SPM) for patent image classification. Patent images usually have no texture and color information which makes it hard for recognition. Many methods based on contour, shape or edge of image have been proposed, however, as far as we are concerned, our method is the first attempt using coding features for patent image classification. First, we extract dense Scale-Invariant Feature Transformation (SIFT) features and use k-means clustering to train a codebook which based on LLC. Second, we divide the image into increasing fine sub-regions and generate the feature for each sub-region as SPM do. Finally, we use a linear SVM classifier for patent image classification. The experiment on a public database for patent image has demonstrated our method has achieved the stated-of-the-art accuracy rate of 94.2%. It proves model based on SPM and LLC have bright future in patent image recognition.
Article
Image segmentation is one of the fundamental steps in image analysis for object identification. The main goal of image segmentation is to recognize homogeneous regions within an image as distinct and belonging to different objects. Inspired by the idea of the packing problem, in this paper, we propose a fast O(Nα(N))-time algorithm for image segmentation by using Non-symmetry and Anti-packing Model and Extended Shading representation, which was called the NAMES-based algorithm, where N is the number of homogenous blocks and α(N) is the inverse of the Ackerman's function and it is a very slowly growing function. We first put forward four extended Lemmas and two extended Theorems. Then, we present a new scanning method used to process each NAMES block. Finally, we propose a novel NAMES-based data structure used to merge two regions. With the same experimental conditions and the same time complexity, our proposed NAMES-based algorithm, which extends the popular hierarchical representation model to a new non-hierarchical representation model, has about 86.75% and 89.47% average execution time improvement ratio when compared to the Binary Partition Tree (BPT)-based algorithm and the Quadtree Shading (QS)-based algorithm which has about 55.4% execution time improvement ratio when the QS-based algorithm itself is compared to the previous fastest region segmentation algorithm by Fiorio and Gustedt whose O(N2)-time algorithm is run on the original N×N gray image. Further, the NAMES can improve the memory-saving by 28.85% (5.04%) and simultaneously reduce the number of the homogeneous blocks by 49.05% (36.04%) more than the QS (the BPT) whereas maintaining the satisfactory image quality. Therefore, by comparing our NAMES-based algorithm with the QS-based algorithm and the BPT-based algorithm, the experimental results presented in this paper show that the former has not only higher compression ratio and less number of homogenous blocks than the latter whereas maintaining the satisfactory image quality, but also can significantly improve the execution speed for image segmentation, and therefore it is a much more effective algorithm for image segmentation.
Article
Kernel density estimators (KDE) used for many medical image applications only consider the intensity information of each pixel or its neighbors without the ability of expressing the structure and shape of tissues and organs, and they suffer from boundary bias problem. In this paper, we propose a new first-order kernel density estimation (FOKDE) method for 1D intensity information and 2D spatial information of medical image in two steps. First, the FOKDE of intensity information is estimated and applied to medical image segmentation with the multi-thresholding algorithm. Second, we estimate the FOKDE of spatial information on the initial segmentation, which can express the structure and shape of organs and tissues. In order to evaluate the FOKDE and KDE of the 2D spatial information, we apply them to medical image segmentation with the hill-climbing strategy. Density estimation experiments and segmentation application results on the simulated dataset and real abdomen CT images show us that the FOKDE has smaller boundary bias than the KDE, and that it can estimate the structure and shape of tissues and organs with spatial information effectively.
Article
Intellectual property and the patent system in particular have been extremely present in research and discussion, even in the public media, in the last few years. Without going into any controversial issues regarding the patent system, we approach a very real and growing problem: searching for innovation. The target collection for this task does not consist of patent documents only, but it is in these documents that the main difference is found compared to web or news information retrieval. In addition, the issue of patent search implies a particular user model and search process model. This review is concerned with how research and technology in the field of Information Retrieval assists or even changes the processes of patent search. It is a survey of work done on patent data in relation to Information Retrieval in the last 20-25 years. It explains the sources of difficulty and the existing document processing and retrieval methods of the domain, and provides a motivation for further research in the area.
Article
In this paper, we study an information theoretic approach to image similarity measurement for content-base image retrieval. In this novel scheme, similarities are measured by the amount of information the images contained about one another mutual information (MI). The given approach is based on the premise that two similar images should have high mutual information, or equivalently, the querying image should convey high information about those similar to it. The method first generates a set of statistically representative visual patterns and uses the distributions of these patterns as images content descriptors. To measure the similarity of two images, we develop a method to compute the mutual information between their content descriptors. Two images with larger descriptor mutual information are regarded as more similar. We present experimental results, which demonstrate that mutual information is a more effective image similarity measure than those have been used in the literature such as Kullback-Leibler divergence and L2 norms.
Article
Relatively little research has been done on the topic of patent image retrieval and in general in most of the approaches the retrieval is performed in terms of a similarity measure between the query image and the images in the corpus. However, systems aimed at overcoming the semantic gap between the visual description of patent images and their conveyed concepts would be very helpful for patent professionals. In this paper we present a flowchart recognition method aimed at achieving a structured representation of flowchart images that can be further queried semantically. The proposed method was submitted to the CLEF-IP 2012 flowchart recognition task. We report the obtained results on this dataset.
Conference Paper
For the traditional content-based image retrieval system, the number of irrelevant images for a given query image is significantly more than that of relevant images in an image repository. Therefore, the numbers of negative samples and positive samples are highly unbalanced, which makes the traditional binary classifiers ineffective. In this paper, our proposed modified AdaBoost-based one-class support vector machine (OCSVM) ensemble is utilized to deal with the aforesaid problem. In our proposed method, the weight update formula of training data for AdaBoost is modified to make AdaBoost fit for combining the results of OCSVMs even though OCSVM is regarded as a strong classifier. Compared with the other three related methods, our proposed approach exhibits better performance on the three benchmark image databases.
Conference Paper
The structure of document images plays a significant role in document analysis thus considerable efforts have been made towards extracting and understanding document structure, usually in the form of layout analysis approaches. In this paper, we first employ Distance Transform based MSER (DTMSER) to efficiently extract stable document structural elements in terms of a dendrogram of key-regions. Then a fast structural matching method is proposed to query the structure of document (dendrogram) based on a spatial database which facilitates the formulation of advanced spatial queries. The experiments demonstrate a significant improvement in a document retrieval scenario when compared to the use of typical Bag of Words (BoW) and pyramidal BoW descriptors.
Conference Paper
Environmental data are considered of utmost importance for human life, since weather conditions, air quality and pollen are strongly related to health issues and affect everyday activities. This paper addresses the problem of discovery of air quality and pollen forecast Web resources, which are usually presented in the form of heatmaps (i.e. graphical representation of matrix data with colors). Towards the solution of this problem, we propose a discovery methodology, which builds upon a general purpose search engine and a novel post processing heatmap recognition layer. The first step involves generation of domain-specific queries, which are submitted to the search engine, while the second involves an image classification step based on visual low level features to identify Web sites including heatmaps. Experimental results comparing various visual features combinations show that relevant environmental sites can be efficiently recognized and retrieved.
Conference Paper
Focussed crawlers enable the automatic discovery of Web resources about a given topic by automatically navigating the Web link structure and selecting the hyperlinks to follow by estimating their relevance to the topic based on evidence obtained from the already downloaded pages. This work proposes a classifier-guided focussed crawling approach that estimates the relevance of a hyperlink to an unvisited Web resource based on the combination of textual evidence representing its local context, namely the textual content appearing in its vicinity in the parent page, with visual evidence associated with its global context, namely the presence of images relevant to the topic within the parent page. The proposed focussed crawling approach is applied towards the discovery of environmental Web resources that provide air quality measurements and forecasts, since such measurements (and particularly the forecasts) are not only provided in textual form, but are also commonly encoded as multimedia, mainly in the form of heatmaps. Our evaluation experiments indicate the effectiveness of incorporating visual evidence in the link selection process applied by the focussed crawler over the use of textual features alone, particularly in conjunction with hyperlink exploration strategies that allow for the discovery of highly relevant pages that lie behind apparently irrelevant ones.
Conference Paper
Full-text available
This paper explores the use of tree-based data structures in shape analysis. We consider a structure which combines several properties of traditional tree models and obtain an efficiently compressed yet faithful representation of shapes. Constructed in a top-down fashion, the resulting trees are unbalanced but resolution adaptive. While the interior of a shape is represented by just a few nodes, the structure automatically accounts for more details at wiggly parts of a shape’s boundary. Since its construction only involves simple operations, the structure provides an easy access to salient features such as concave cusps or maxima of curvature. Moreover, tree serialization leads to a representation of shapes by means of sequences of salient points. Experiments with a standard shape database reveal that correspondingly trained HMMs allow for robust classification. Finally, using spectral clustering, tree-based models also enable the extraction of larger, semantically meaningful, salient parts of shapes.
Conference Paper
Full-text available
We address the problems of 1/ assessing the confidence of the standard point estimates, precision, recall and F-score, and 2/ comparing the results, in terms of precision, recall and F-score, obtained using two different methods. To do so, we use a probabilistic setting which allows us to obtain posterior distributions on these performance indicators, rather than point estimates. This framework is applied to the case where different methods are run on different datasets from the same source, as well as the standard situation where competing results are obtained on the same data.
Conference Paper
Full-text available
The real time requirement is an additional constraint on many intelligent applications in robotics, such as shape recognition and retrieval using a mobile robot platform. In this paper, we present a scalable approach for efficiently retrieving closed contour shapes. The contour of an object is represented by piecewise linear segments. A skip Tri-Gram is obtained by selecting three segments in the clockwise order while allowing a constant number of segments to be ¿skipped¿ in between. The main idea is to use skip Tri-Grams of the segments to implicitly encode the distant dependency of the shape. All skip Tri-Grams are used for efficiently retrieving closed contour shapes without pairwise matching feature points from two shapes. The retrieval is at least an order of magnitude faster than other state-of-the-art algorithms. We score 80% in the Bullseye retrieval test on the whole MPEG 7 shape dataset. We further test the algorithm using a mobile robot platform in an indoor environment. 8 objects are used for testing from different viewing directions, and we achieve 82% accuracy.
Conference Paper
Full-text available
Object classification often operates by making decisions based on the values of several shape properties measured from the image. The paper describes and tests several algorithms for calculating ellipticity, rectangularity, and triangularity shape descriptors
Conference Paper
Full-text available
This paper reports on tree-based shape encoding and classification. We present an approach that combines characteristics from the theories of R-trees known from data base indexing and regression trees known from pattern recognition. The resulting shape representations are highly storage efficient. As they immediately transform into scale invariant signatures, we apply the Earth Mover's distance for computing shape similarities. Experimental results underline the efficacy of this approach. The required computations are simple and fast but allow for robust shape classification and clustering
Conference Paper
Full-text available
In the paper, our video retrieval system is presented. The system acts as a decision support system to help users to find what they want with many analysis and visualization tools provided by the system. It consists of three basic retrieval models which searches shots in text, image and concept space respectively. The results from different modalities are fused to achieve better performance. The relevance shots are shown to users in different threads and expanded in different ways to help users try their best to make correct decision during the retrieval procedure.
Conference Paper
Full-text available
In this paper, the MKLab interactive video retrieval system is described.
Conference Paper
Full-text available
A patent always contains some images along with the text. Many text based systems have been developed to search the patent database. In this paper, we describe PATSEEK that is an image based search system for US patent database. The objective is to let the user check the similarity of his query image with the images that exist in US patents. The user can specify a set of key words that must exist in the text of the patents whose images will be searched for similarity. PATSEEK automatically grabs images from the US patent database on the request of the user and represents them through an edge orientation autocorrelogram. L1 and L2 distance measures are used to compute the distance between the images. A recall rate of 100% for 61% of query images and an average 32% recall rate for rest of the images has been observed.
Conference Paper
Full-text available
Developing effective content recognition methods for diverse imagery continues to challenge computer vision researchers. We present a new approach for document image content categorization using a lexicon of shape features. Each lexical word corresponds to a scale and rotation invariant shape feature that is generic enough to be detected repeatably and segmentation free. We learn a concise, structurally indexed shape lexicon from training by clustering and partitioning feature types through graph cuts. We demonstrate our approach on two challenging document image content recognition problems: 1) The classification of 4,500 Web images crawled from Google Image Search into three content categories — pure image, image with text, and document image, and 2) Language identification of 8 languages (Arabic, Chinese, English, Hindi, Japanese, Korean, Russian, and Thai) on a 1,512 complex document image database composed of mixed machine printed text and handwriting. Our approach is capable to handle high intra-class variability and shows results that exceed other state-of-the-art approaches, allowing it to be used as a content recognizer in image indexing and retrieval systems.
Conference Paper
Full-text available
This paper presents a method for detecting categories of objects in real-world images. Given training images of an object category, our goal is to recognize and localize instances of those objects in a candidate image. The main contribution of this work is a novel structure of the shape code- book for object detection. A shape codebook entry consists of two compo- nents: a shape codeword and a group of associated vectors that specify the object centroids. Like their counterpart in language, the shape codewords are simple and generic such that they can be easily extracted from most object categories. The associated vectors store the geometrical relationships be- tween the shape codewords, which specify the characteristics of a particular object category. Thus they can be considered as the "grammar" of the shape codebook. In this paper, we use Triple-Adjacent-Segments (TAS) extracted from im- age edges as the shape codewords. Object detection is performed in a prob- abilistic voting framework. Experimental results on public datasets show performance similiar to the state-of-the-art, yet our method has significantly lower complexity and requires considerably less supervision in the training (We only need bounding boxes for a few training samples, do not need fig- ure/ground segmentation and do not need a validation dataset).
Conference Paper
Full-text available
In this paper, we present a new feature extraction method that simultaneously captures the global and local characteristics of an image by adaptively computing hierarchical geometric centroids of the image. We show that these hierarchical centroids have some very interesting properties such as illumination invariant and insensitive to scaling. We have applied the method for near-duplicate image recognition and for content-based image retrieval. We present experimental results to show that our method works effectively in both applications.
Conference Paper
Full-text available
Apart from the computer vision community, an always increasing number of scientific domains show a great interest for image analysis techniques. This interest is often guided by practical needs. As examples, we can cite all the medical imagery systems, the satellites images treatment and botanical databases. A common point of these applications is the large image collections that are generated and therefore require some automatic tools to help the scientists. These tools should allow clear structuration of the visual information and provide fast and accurate retrieval process. In the framework of the plant genes expression study we designed a content-based image retrieval (CBIR) system to assist botanists in their work. We propose a new contour- based shape descriptor that satisfies the constraints of this application (accuracy and real-time search). It is called Directional Fragment Histogram (DFH). This new descriptor has been evaluated and compared to several shape descriptors.
Article
Full-text available
The paper presents a review of 200 references in content-based image retrieval. The paper starts with discussing the working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap. Subsequent sections discuss computational steps for image retrieval systems. Step one of the review is image processing for retrieval sorted by color, texture, and local geometry. Features for retrieval are discussed next, sorted by: accumulative and global features, salient points, object and shape features, signs, and structural combinations thereof. Similarity of pictures and objects in pictures is reviewed for each of the feature types, in close connection to the types and means of feedback the user of the systems is capable of giving by interaction. We briefly discuss aspects of system engineering: databases, system architecture, and evaluation. In the concluding section, we present our view on: the driving force of the field, the heritage from computer vision, the influence on computer vision, the role of similarity and of interaction, the need for databases, the problem of evaluation, and the role of the semantic gap.
Article
Full-text available
Nowadays, an increasingly growing demand for advanced multimedia search engines is arising, as huge amounts of digital visual content are becoming available. The contribution of this paper is the introduction of a hybrid multimedia retrieval model accompanied by the presentation of a search engine that is capable of retrieving visual content from cultural heritage multimedia libraries as in three modes: (i) based on their semantic annotation with the help of an ontology; (ii) based on the visual features with a view to finding similar content; and (iii) based on the combination of these two strategies in order to produce recommendations. To achieve this, the retrieval model is composed of two different parts, a low-level visual feature analysis and retrieval and a high-level ontology infrastructure. The main novelty is the way in which these two co-operate transparently during the evaluation of a single query in a hybrid fashion, making recommendations to the user and retrieving content that is both visually and semantically similar. A search engine has been developed implementing this model which is capable of searching through digital libraries of cultural heritage collections, and indicative examples are discussed, along with insights into its performance.
Article
Full-text available
Content-Based Image Retrieval (CBIR) has been a topic of research interest for nearly a decade. Approaches to date use image features for describing content. A survey of the literature shows that progress has been limited to prototype systems that make gross assumptions and approximations. Additionally, research attention has been largely focused on stock image collections. Advances in medical imaging have led to growth in large image collections. At the Lister Hill National Center for Biomedical Communication, an R&D division of the National Library of Medicine, we are conducting research on CBIR for biomedical images. We maintain an archive of over 17,000 digitized x-rays of the cervical and lumbar spine from the second National Health and Nutrition Examination Survey (NHANES II). In addition, we are developing an archive of a large number of digitized 35 mm color slides of the uterine cervix. Our research focuses on developing techniques for hybrid text/image query-retrieval from the survey text and image data. In this paper we present the challenges in developing CBIR of biomedical images and results from our research efforts.
Conference Paper
Full-text available
The study of 2D shapes and their similarities is a central problem in the field of vision. It arises in particular from the task of classifying and recognizing objects from their observed silhouette. Defining natural distances between 2D shapes creates a metric space of shapes, whose mathematical structure is inherently relevant to the classification task. One intriguing metric space comes from using conformal mappings of 2D shapes into each other, via the theory of Teichmuller spaces. In this space, every simple closed curve in the plane (a "shape") is represented by a "fingerprint", which is a diffeomorphism of the unit circle to itself (a differentiable and invertible, periodic function). More precisely, every shape defines to a unique equivalence class of such diffeomorphisms up to right multiplication by a Mobius map. The fingerprint does not change if the shape is varied by translations and scaling and any such equivalence class comes from some shape. This coset space, equipped with the infinitesimal Weil-Petersson (WP) Riemannian norm is a metric space. In this space, it appears very likely to be true that the shortest path between each two shapes is unique, and is given by a geodesic connecting them. Their distance from each other is given by integrating the WP-norm along that geodesic. In this paper we concentrate on solving the "welding" problem of "sewing" together conformally the interior and exterior of the unit circle, glued on the unit circle by a given diffeomorphism, to obtain the unique 2D shape associated with this diffeomorphism. These allow us to go back and forth between 2D shapes and their representing diffeomorphisms in this "space of shapes".
Article
Full-text available
Presents a review of 200 references in content-based image retrieval. The paper starts with discussing the working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap. Subsequent sections discuss computational steps for image retrieval systems. Step one of the review is image processing for retrieval sorted by color, texture, and local geometry. Features for retrieval are discussed next, sorted by: accumulative and global features, salient points, object and shape features, signs, and structural combinations thereof. Similarity of pictures and objects in pictures is reviewed for each of the feature types, in close connection to the types and means of feedback the user of the systems is capable of giving by interaction. We briefly discuss aspects of system engineering: databases, system architecture, and evaluation. In the concluding section, we present our view on: the driving force of the field, the heritage from computer vision, the influence on computer vision, the role of similarity and of interaction, the need for databases, the problem of evaluation, and the role of the semantic gap
Article
Full-text available
We present the status of ongoing work toward the development of a biomedical information system at the Lister Hill National Center for Biomedical Communications, a research and development division of the National Library of Medicine (NLM). For any class of biomedical images, the problems confronting the researcher in image indexing are (a) developing robust algorithms for localizing and identifying anatomy relevant for that image class and relevant to the indexing goals, (b) developing algorithms for labeling the segmented anatomy based on its pathology, (c) developing a suitable indexing and similarity matching method for visual data, and (d) associating the text information on the imaged person, indexed separately, for query and retrieval along with the visual information. We are in the process of building a content-based image retrieval system which supports hybrid image and text queries and includes a biomedical image database. The paper describes this prototype CBIR2 system and the algorithms used in it. Image shapebased retrieval is done by image example and user sketch of the vertebrae on the spine x-ray images from the National Health and Nutrition Examination Survey (NHANES) data.
Article
Full-text available
Object classification often operates by making decisions based on the values of several shape properties measured from the image. This paper describes and tests several algorithms for calculating ellipticity, rectangularity, and triangularity shape descriptors.
Article
Full-text available
We present a novel approach to measuring similarity between shapes and exploit it for object recognition. In our framework, the measurement of similarity is preceded by 1) solving for correspondences between points on the two shapes, 2) using the correspondences to estimate an aligning transform. In order to solve the correspondence problem, we attach a descriptor, the shape context, to each point. The shape context at a reference point captures the distribution of the remaining points relative to it, thus offering a globally discriminative characterization. Corresponding points on two similar shapes will have similar shape contexts, enabling us to solve for correspondences as an optimal assignment problem. Given the point correspondences, we estimate the transformation that best aligns the two shapes; regularized thin-plate splines provide a flexible class of transformation maps for this purpose. The dissimilarity between the two shapes is computed as a sum of matching errors between corresponding points, together with a term measuring the magnitude of the aligning transform. We treat recognition in a nearest-neighbor classification framework as the problem of finding the stored prototype shape that is maximally similar to that in the image. Results are presented for silhouettes, trademarks, handwritten digits, and the COIL data set.
Article
We have witnessed great interest and a wealth of promise in content-based image retrieval as an emerging technology. While the last decade laid foundation to such promise, it also paved the way for a large number of new techniques and systems, got many new people involved, and triggered stronger association of weakly related fields. In this article, we survey almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation, and in the process discuss the spawning of related subfields. We also discuss significant challenges involved in the adaptation of existing image retrieval techniques to build systems that can be useful in the real world. In retrospect of what has been achieved so far, we also conjecture what the future may hold for image retrieval research.
Article
We introduce in this chapter some fundamental theories for content-based image retrieval. Section 1.1 looks at the development of content-based image retrieval techniques. Then, as the emphasis of this chapter, we introduce in detail in Section 1.2 some widely used methods for visual content descriptions. After that, we briefly address similarity/distance measures between visual features, the indexing schemes, query formation, relevance feedback, and system performance evaluation in Sections 1.3, 1.4 and 1.5. Details of these techniques are discussed in subsequent chapters. Finally, we draw a conclusion in Section 1.6.
Article
The luminescence behaviors of dislocation network in directly bonded silicon wafers have been investigated in this paper. The individual dislocations were observed in the sample bonded with extreme small misorientation angles by electron beam induced current (EBIC) technique. The temperature dependence of EBIC contrast of the dislocation lines showed that its contamination degree was smaller than 104/cm. The cathodoluminescence (CL) from the dislocation networks showed D1-line existed in all the bonded samples, often along with D2-line. The D3/D4-lines could also be obtained by tuning the misorientations. Meanwhile, the application of an external bias can effectively enhance the luminescence. Furthermore, a metal-insulator (SiOx, x < 2)-semiconductor light-emitting diode (MOS-LED) based on the bonded silicon wafer was demonstrated. (© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)
Book
MPEG-7 is the first international standard which contains a number of key techniques from Computer Vision and Image Processing. The Curvature Scale Space technique was selected as a contour shape descriptor for MPEG-7 after substantial and comprehensive testing, which demonstrated the superior performance of the CSS-based descriptor. Curvature Scale Space Representation: Theory, Applications, and MPEG-7 Standardization is based on key publications on the CSS technique, as well as its multiple applications and generalizations. The goal was to ensure that the reader will have access to the most fundamental results concerning the CSS method in one volume. These results have been categorized into a number of chapters to reflect their focus as well as content. The book also includes a chapter on the development of the CSS technique within MPEG standardization, including details of the MPEG-7 testing and evaluation processes which led to the selection of the CSS shape descriptor for the standard. The book can be used as a supplementary textbook by any university or institution offering courses in computer and information science.
Article
Improving safety is a key goal in autonomous road vehicles. Driver support systems that help drivers react to changing road conditions can potentially improve safety. As with any vehicle, autonomous vehicle driving on public roads must obey the rules of the road. Many of these rules are conveyed through the use of road signs, so an autonomous vehicle must be able to detect and recognize the road signs and change its behavior accordingly. This implies that the system must be able to detect a real world road sign and match its image to images that are already present in its underlying database. In order to be effective it is critical that the system is able to perform this matching accurately. The ability to match a picture of a real world image with images already in the database based on visual characteristics is called Content-Based Image Retrieval (CBIR). This paper proposes a method for improving the accuracy of CBIR systems by augmenting their underlying databases. 1. INTRODUCTION Fatal car crashes occur every day around the world. An estimated 30% of the fatal car crashes can be attributed to the driver inattention and fatigue [1]. This can be reduced by autonomous vehicles which work in part by automatically recognizing road signs placed near streets and highways. The roadway is well structured, and the appearance of the road signs is highly restricted. Each type of sign must be of a particular size, color and shape with few exceptions. Thus, the ability to correctly identify the size, color, and shape of an object is useful when attempting to automatically recognize an image of a given road sign.
Article
A method is described which permits the encoding of arbitrary geometric configurations so as to facilitate their analysis and manipulation by means of a digital computer. It is shown that one can determine through the use of relatively simple numerical techniques whether a given arbitrary plane curve is open or closed, whether it is singly or multiply connected, and what area it encloses. Further, one can cause a given figure to be expanded, contracted, elongated, or rotated by an arbitrary amount. It is shown that there are a number of ways of encoding arbitrary geometric curves to facilitate such manipulations, each having its own particular advantages and disadvantages. One method, the so-called rectangular-array type of encoding, is discussed in detail. In this method the slope function is quantized into a set of eight standard slopes. This particular representation is one of the simplest and one that is most readily utilized with present-day computing and display equipment.
Article
This paper introduces a new feature vector for shape-based image indexing and retrieval. This feature classifies image edges based on two factors: their orientations and correlation between neighboring edges. Hence it includes information of continuous edges and lines of images and describes major shape properties of images. This scheme is effective and robustly tolerates translation, scaling, color, illumination, and viewing position variations. Experimental results show superiority of proposed scheme over several other indexing methods. Averages of precision and recall rates of this new indexing scheme for retrieval as compared with traditional color histogram are 1.99 and 1.59 times, respectively. These ratios are 1.26 and 1.04 compared to edge direction histogram.
Article
Curvature scale space (CSS) image is a multi-scale organisation of the inflection points of a closed planar curve as it is smoothed. It consists of several arch shape contours, each related to a concavity or a convexity of the curve. The maxima of these contours have already been used as shape descriptors to find similar shapes in large image databases. In this article, we address the problem of shallow concavities. These may give rise to large contours in the CSS image. These contours may then match those corresponding to deep and wide concavities during the matching process. The phenomenon can be explained by recalling the fact that Gaussian smoothing leads to an approximation of geometric heat equation deformation. We have introduced a method to enrich the CSS image and create different contours for different types of concavities. We tested the proposed method on a database of 1100 images of marine creatures. A significant improvement was observed in the performance of the system on shapes with shallow segments.
Article
In this paper, we present a shape retrieval method using triangle-area representation for nonrigid shapes with closed contours. The representation utilizes the areas of the triangles formed by the boundary points to measure the convexity/concavity of each point at different scales (or triangle side lengths). This representation is effective in capturing both local and global characteristics of a shape, invariant to translation, rotation, and scaling, and robust against noise and moderate amounts of occlusion. In the matching stage, a dynamic space warping (DSW) algorithm is employed to search efficiently for the optimal (least cost) correspondence between the points of two shapes. Then, a distance is derived based on the optimal correspondence. The performance of our method is demonstrated using four standard tests on two well-known shape databases. The results show the superiority of our method over other recent methods in the literature.
Article
In this article, we discuss the potential benefits, the requirements and the challenges involved in patent image retrieval and subsequently, we propose a framework that encompasses advanced image analysis and indexing techniques to address the need for content-based patent image search and retrieval. The proposed framework involves the application of document image pre-processing, image feature and textual metadata extraction in order to support effectively content-based image retrieval in the patent domain. To evaluate the capabilities of our proposal, we implemented a patent image search engine. Results based on a series of interaction modes, comparison with existing systems and a quantitative evaluation of our engine provide evidence that image processing and indexing technologies are currently sufficiently mature to be integrated in real-world patent retrieval applications.
Article
This paper presents a novel shape representation algorithm based on mathematical morphology. It consists of two steps. Firstly, an input shape is decomposed into a union of meaningful convex subparts by a recursive scheme. Each subpart is obtained by repeatedly applying condition expansion to a seed, which is selected by utilizing the skeleton information. Secondly, the shape of each subpart is approximated by a morphological dilation of basic structuring elements. The location and direction of the subpart are represented respectively by two parameters. Thus the given shape is represented by a union set of a number of three-dimensional vectors. Experiments show that the new algorithm is immune to noise and occlusion, and invariant under rotation, translation and scaling. Compared to other algorithms, it achieves more natural looking shape components and more concise representation at lower computation costs and coding costs.
Conference Paper
Shape retrieval from image databases is a complex problem. This paper reports an investigation on the comparative effectiveness of a number of different shape features (including those included in the recent MPEG-7 standard) and matching techniques in the retrieval of multi-component trademark images. Experiments were conducted within the framework of the ARTISAN shape retrieval system, and retrieval effectiveness assessed on a database of over 10 000 images, using 24 queries and associated ground truth supplied by the UK Patent Office. Our results show clearly that multi-component matching can give better results than whole-image matching. However, only minor differences in retrieval effectiveness were found between different shape features or distance measures, suggesting that a wide variety of shape feature combinations and matching techniques can provide adequate discriminating power for effective retrieval.
Conference Paper
With Internet delivery of video content surging to an un-precedented level, video recommendation has become a very popular online service. The capability of recommending relevant videos to targeted users can alleviate users' efforts on finding the most relevant content according to their current viewings or preferences. This paper presents a novel online video recommendation system based on multimodal fusion and relevance feedback. Given an online video document, which usually consists of video content and related information (such as query, title, tags, and surroundings), video recommendation is formulated as finding a list of the most relevant videos in terms of multimodal relevance. We express the multimodal relevance between two video documents as the combination of textual, visual, and aural relevance. Furthermore, since different video documents have different weights of the relevance for three modalities, we adopt relevance feedback to automatically adjust intra-weights within each modality and inter-weights among different modalities by users' click-though data, as well as attention fusion function to fuse multimodal relevance together. Unlike traditional recommenders in which a sufficient collection of users' profiles is assumed available, this proposed system is able to recommend videos without users' profiles. We conducted an extensive experiment on 20 videos searched by top 10 representative queries from more than 13k online videos, reported the effectiveness of our video recommendation system.
Conference Paper
This work presents a new multiscale, curvature-based shape representation technique for planar curves. One limitation of the well-known curvature scale space (CSS) method is that it uses only curvature zero-crossings to characterize shapes and thus there is no CSS descriptor for convex shapes. The proposed method, on the other hand, uses bidimensional-unidimensional-bidimensional transformations together with resampling techniques to retain the full curvature information for shape characterization. It also employs the correlation coefficient as a measure of similarity. In the evaluation tests, the proposed method achieved a high correct classification rate (CCR), even when the shapes were severely corrupted by noise. Results clearly showed that the proposed method is more robust to noise than CSS.
Conference Paper
A novel hybrid system for shape-based image retrieval using the curvature scale space (CSS) and self-organizing map (SOM) methods is presented. The shape features of images are represented by CSS images extracted from, for example, a large database and they are processed using the PCA technique. These processed CSS images constitutes the training dataset for a SOM neural network which, in turn, will be used for performing efficient image retrieval. Experimental results using a benchmark database are presented to demonstrate the usefulness of the proposed methodology.
Article
We have witnessed great interest and a wealth of promise in content-based image retrieval as an emerging technology. While the last decade laid foundation to such promise, it also paved the way for a large number of new techniques and systems, got many new people involved, and triggered stronger association of weakly related elds. In this paper, we survey almost 300 key theoret- ical and empirical contributions in the current decade related to image retrieval and automatic image annotation, and discuss the spawning of related sub-elds in the process. We also discuss signican t challenges involved in the adaptation of existing image retrieval techniques to build systems that can be useful in the real-world. In retrospect of what has been achieved so far, we also conjecture what the future may hold for image retrieval research.
Article
We present a family of scale-invariant local shape features formed by chains of k connected, roughly straight contour segments (kAS), and their use for object class detection. kAS are able to cleanly encode pure fragments of an object boundary, without including nearby clutter. Moreover, they offer an attractive compromise between information content and repeatability, and encompass a wide variety of local shape structures. We also define a translation and scale invariant descriptor encoding the geometric configuration of the segments within a kAS, making kAS easy to reuse in other frameworks, for example as a replacement or addition to interest points. Software for detecting and describing kAS is released on lear.inrialpes.fr/software. We demonstrate the high performance of kAS within a simple but powerful sliding-window object detection scheme. Through extensive evaluations, involving eight diverse object classes and more than 1400 images, we 1) study the evolution of performance as the degree of feature complexity k varies and determine the best degree; 2) show that kAS substantially outperform interest points for detecting shape-based classes; 3) compare our object detector to the recent, state-of-the-art system by Dalal and Triggs [4].
Conference Paper
Mixture modelling is a hot area in pattern recognition. Although most research in this area has focused on mixtures for continuous data, there are many pattern recognition tasks for which binary or discrete mixtures are better suited. This paper focuses on the use of Bernoulli mixtures for binary data and, in particular, for binary images. Results are reported on a task of handwritten Indian digits.
Article
The authors present an efficient two-stage approach for leaf image retrieval by using simple shape features including centroid-contour distance (CCD) curve, eccentricity and angle code histogram (ACH). In the first stage, the images that are dissimilar with the query image are first filtered out by using eccentricity to reduce the search space, and fine retrieval follows by using all three sets of features in the reduced search space in the second stage. Different from eccentricity and ACH, the CCD curve is neither scaling-invariant nor rotation-invariant. Therefore, normalisation is required for the CCD curve to achieve scaling invariance, and starting point location is required to achieve rotation invariance with the similarity measure of CCD curves. A thinning-based method is proposed to locate starting points of leaf image contours, so that the approach used is more computationally efficient. Actually, the method can benefit other shape representations that are sensitive to starting points by reducing the matching time in image recognition and retrieval. Experimental results on 1400 leaf images from 140 plants show that the proposed approach can achieve a better retrieval performance than both the curvature scale space (CSS) method and the modified Fourier descriptor (MFD) method. In addition, the two-stage approach can achieve a performance comparable to an exhaustive search, but with a much reduced computational complexity.
Article
Retrieval efficiency and accuracy are two important issues in designing a content-based database retrieval system. We propose a method for trademark image database retrieval based on object shape information that would supplement traditional text-based retrieval systems. This system achieves both the desired efficiency and accuracy using a two-stage hierarchy: in the first stage, simple and easily computable shape features are used to quickly browse through the database to generate a moderate number of plausible retrievals when a query is presented; in the second stage, the candidates from the first stage are screened using a deformable template matching process to discard spurious matches. We have tested the algorithm using hand drawn queries on a trademark database containing 1; 100 images. Each retrieval takes a reasonable amount of computation time (¸ 4-5 seconds on a Sun Sparc 20 workstation). The top most image retrieved by the system agrees with that obtained by human subjects, ...
Article
The user of an image database often wishes to retrieve all images similar to the one (s)he already has. Using some features like texture, color and shape, we can associate a feature vector to every image in the database. A fast indexing method then can be used to retrieve similar images based on their associated vectors. We use the maxima of curvature zero crossing contours of Curvature Scale Space (CSS) image as a feature vector to represent the shapes of object boundary contours. The matching algorithm which compares two sets of maxima and assigns a matching value as a measure of similarity is presented in this paper. The method is robust with respect to noise, scale and orientation changes of objects. It is also capable to retrieve objects which are similar to the mirror-image of the input boundary. We introduce the aspect ratio of the CSS image as a new parameter which can be used for indexing in conjunction with other parameters like eccentricity and circularity. T...
Article
Contents Preface vii 1 Image Content Analysis and Description 1 Xenophon Zabulis, Stelios C. Orphanoudakis 2 Local Features for Image Retrieval 21 Luc Van Gool, Tinne Tuytelaars, Andreas Turina 3 Fast Invariant Feature Extraction for Image Retrieval 43 Sven Siggelkow, Hans Burkhardt 4 Shape Description and Search for Similar Objects in Image Databases 69 Longin Jan Latecki, Rolf Lakaemper 5 Features in Content-based Image Retrieval Systems: a Survey 97 Remco C. Veltkamp, Mirela Tanase, Danielle Sent 6 Probablistic Image Models for Object Recognition and Pose Estimation 125 Joachim Hornegger, Heinrich Niemann 7 Distribution-based Image Similarity 143 Jan Puzicha 8 Distribution Free Statistics for Segmentation 165 Greet Frederix, Eric J. Pauwels 9 Information Retrieval Methods for Multimedia Objects 191 Norbert Fuhr 10 New descriptors for image and
Shape retrieval using trianglearea representation and dynamic space warping Image retrieval based on shape similarity by edge orientation autocorrelogram
  • N Alajlan
  • I E Rube
  • M S Kamel
  • G Freeman
N. Alajlan, I.E. Rube, M.S. Kamel, G. Freeman, Shape retrieval using trianglearea representation and dynamic space warping, Pattern Recognition 40 (2007) 1911–1920. [15] F. Mahmoudi, J. Shanbehzadeh, A.-M. Eftekhari-Moghadam, H. Soltanian-Zadeh, Image retrieval based on shape similarity by edge orientation autocorrelogram, Pattern Recognition 36 (2003) 1725–1736.
Bernoulli mixture models for binary images Groups of adjacent contour segments for object detection
  • A Juan
  • E Vidal
  • V Ferrari
  • L Fevrier
  • F Jurie
  • C Schmid
A. Juan, E. Vidal, Bernoulli mixture models for binary images, in: Proceedings of International Conference on Pattern Recognition, 2004. [28] V. Ferrari, L. Fevrier, F. Jurie, C. Schmid, Groups of adjacent contour segments for object detection, IEEE Transactions on Pattern Analysis and Machine Intelligence 30 (2008) 36–51.
Robust and efficient shape indexing through curvature scale space
  • F Mokhtarian
  • S Abbasi
  • J Kittler
F. Mokhtarian, S. Abbasi, J. Kittler, Robust and efficient shape indexing through curvature scale space, in: Proceedings of British Machine Vision Conference, University of Edinburgh, UK, 1996, pp. 53-62.