Learning the user's semantics for CBIR involves two different sources of information: the similarity relations entailed by the content-based features, and the relevance relations specified in the feedback. Given that, we propose an augmented relation embedding (ARE) to map the image space into a semantic manifold that faithfully grasps the user's preferences. Besides ARE, we also look into the issues of selecting a good feature set for improving the retrieval performance. With these two aspects of efforts we have established a system that yields far better results than those previously reported. Overall, our approach can be characterized by three key properties: 1) The framework uses one relational graph to describe the similarity relations, and the other two to encode the relevant/irrelevant relations indicated in the feedback. 2) With the relational graphs so defined, learning a semantic manifold can be transformed into solving a constrained optimization problem, and is reduced to the ARE algorithm accounting for both the representation and the classification points of views. 3) An image representation based on augmented features is introduced to couple with the ARE learning. The use of these features is significant in capturing the semantics concerning different scales of image regions. We conclude with experimental results and comparisons to demonstrate the effectiveness of our method.
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.
[Show abstract][Hide abstract] ABSTRACT: In this paper, we propose a novel semi-supervised feature selection framework
by mining correlations among multiple tasks and apply it to different
multimedia applications. Instead of independently computing the importance of
features for each task, our algorithm leverages shared knowledge from multiple
related tasks, thus, improving the performance of feature selection. Note that
we build our algorithm on assumption that different tasks share common
structures. The proposed algorithm selects features in a batch mode, by which
the correlations between different features are taken into consideration.
Besides, considering the fact that labeling a large amount of training data in
real world is both time-consuming and tedious, we adopt manifold learning which
exploits both labeled and unlabeled training data for feature space analysis.
Since the objective function is non-smooth and difficult to solve, we propose
an iterative algorithm with fast convergence. Extensive experiments on
different applications demonstrate that our algorithm outperforms other
state-of-the-art feature selection algorithms.
"We further leverage Manifold Regularization  built upon the graph Laplacian to extend our framework to a semi-supervised scenario. Manifold Regularization is adopted because multimedia data have been normally shown to possess a manifold structure ,  and Manifold Regularization can explore it. Consequently, by applying Manifold Regularization to the loss function in (1), we obtain (3) where denotes the trace operator. "
[Show abstract][Hide abstract] ABSTRACT: In this paper, we propose a novel semi-supervised feature analyzing framework for multimedia data understanding and apply it to three different applications: image annotation, video concept detection and 3-D motion data analysis. Our method is built upon two advancements of the state of the art: (1) l2, 1-norm regularized feature selection which can jointly select the most relevant features from all the data points. This feature selection approach was shown to be robust and efficient in literature as it considers the correlation between different features jointly when conducting feature selection; (2) manifold learning which analyzes the feature space by exploiting both labeled and unlabeled data. It is a widely used technique to extend many algorithms to semi-supervised scenarios for its capability of leveraging the manifold structure of multimedia data. The proposed method is able to learn a classifier for different applications by selecting the discriminating features closely related to the semantic concepts. The objective function of our method is non-smooth and difficult to solve, so we design an efficient iterative algorithm with fast convergence, thus making it applicable to practical applications. Extensive experiments on image annotation, video concept detection and 3-D motion data analysis are performed on different real-world data sets to demonstrate the effectiveness of our algorithm.
"In pattern recognition, dimensionality reduction is an effective technique to solve the 'curse of dimensionality'  and improve classification performance and computational efficiency in many applications, such as face recognition      and image retrieval   . Recently, a general framework for dimensionality reduction has been developed in . "
[Show abstract][Hide abstract] ABSTRACT: In this article, we develop a linear supervised subspace learning method called locality-based discriminant neighborhood embedding (LDNE), which can take advantage of the underlying submanifold-based structures of the data for classification. Our LDNE method can simultaneously consider both 'locality' of locality preserving projection (LPP) and 'discrimination' of discriminant neighborhood embedding (DNE) in manifold learning. It can find an embedding that not only preserves local information to explore the intrinsic submanifold structure of data from the same class, but also enhances the discrimination among submanifolds from different classes. To investigate the performance of LDNE, we compare it with the state-of-the-art dimensionality reduction techniques such as LPP and DNE on publicly available datasets. Experimental results show that our LDNE can be an effective and robust method for classification.
The Computer Journal 08/2012; 56(9):1063-1082. DOI:10.1093/comjnl/bxs113 · 0.79 Impact Factor