Mingjing Li’s research while affiliated with University of Science and Technology of China and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (148)


Statistical approach to large-scale image annotation
  • Patent
  • Full-text available

November 2013

·

11 Reads

Mingjing Li

·

Statistical approaches to large-scale image annotation are described. Generally, the annotation technique includes compiling visual features and textual information from a number of images, hashing the images visual features, and clustering the images based on their hash values. An example system builds statistical language models from the clustered images and annotates the image by applying one of the statistical language models.

Download

Dual cross-media relevance model for image annotation

October 2013

·

19 Reads

·

1 Citation

Mingjing Li

·

Jing Lui

·

Bin Wang

·

[...]

·

A dual cross-media relevance model (DCMRM) is used for automatic image annotation. In contrast to the traditional relevance models which calculate the joint probability of words and images over a training image database, the DCMRM model estimates the joint probability by calculating the expectation over words in a predefined lexicon. The DCMRM model may be advantageous because a predefined lexicon potentially has better behavior than a training image database. The DCMRM model also takes advantage of content-based techniques and image search techniques to define the word-to-image and word-to-word relations involved in image annotation. Both relations can be estimated by using image search techniques on the web data as well as available training data.


Head pose assessment methods and systems

June 2013

·

23 Reads

Improvements are provided to effectively assess a user's face and head pose such that a computer or like device can track the user's attention towards a display device(s). Then the region of the display or graphical user interface that the user is turned towards can be automatically selected without requiring the user to provide further inputs. A frontal face detector is applied to detect the user's frontal face and then key facial points such as left/right eye center, left/right mouth corner, nose tip, etc., are detected by component detectors. The system then tracks the user's head by an image tracker and determines yaw, tilt and roll angle and other pose information of the user's head through a coarse to fine process according to key facial points and/or confidence outputs by pose estimator.


Estimating word correlations from images

June 2013

·

40 Reads

Word correlations are estimated using a content-based method, which uses visual features of image representations of the words. The image representations of the subject words may be generated by retrieving images from data sources (such as the Internet) using image search with the subject words as query words. One aspect of the techniques is based on calculating the visual distance or visual similarity between the sets of retrieved images corresponding to each query word. The other is based on calculating the visual consistence among the set of the retrieved images corresponding to a conjunctive query word. The combination of the content-based method and a text-based method may produce even better result.


Latent Topic Visual Language Model for Object Categorization.

January 2011

·

12 Reads

·

2 Citations

This paper presents a latent topic visual language model to handle variation problem in object categorization. Variations including different views, styles, poses, etc., have greatly affected the spatial arrangement and distribution of visual features, on which previous categorization models largely depend. Taking the object variations as hidden topics within each category, the proposed model explores the relationship between object variations and visual feature arrangement in the traditional visual language modeling process. With this improvement, the accuracy of object categorization is further boosted. Experiments on Caltech101 dataset have shown that this model makes sense and is effective.


On cross-language image annotations

August 2009

·

27 Reads

·

1 Citation

Automatic annotation of digital pictures is a key technology for managing and retrieving images from large image collections. Typical algorithms only deal with the problem of monolingual image annotation. In this paper, we propose a framework to deal with the problem of multilingual image annotation, which can annotate images in multiple languages. The framework can not only benefit users with different native languages, but also provide more accurate annotations. In this framework, image annotation is performed in two stages, including parallel monolingual image annotation and the fusion of annotation results in multiple languages. In the first stage, candidate annotations for each language are extracted by leveraging multilingual large scale Web image database. Due to the incompleteness and inaccuracy problem of candidate annotations, we proposed a multilingual annotation fusion algorithm (MAF). By modeling candidate annotations for each language as an n-partite graph, MAF algorithm can improve and re-rank multilingual annotations. Finally, annotations with the highest ranking values in each language are selected and translated as the result. Experimental results for English-Chinese image annotations demonstrate the effectiveness of the proposed framework.


Effective top-k computation with term-proximity support

July 2009

·

43 Reads

·

7 Citations

Information Processing & Management

Modern web search engines are expected to return the top-k results efficiently. Although many dynamic index pruning strategies have been proposed for efficient top-k computation, most of them are prone to ignoring some especially important factors in ranking functions, such as term-proximity (the distance relationship between query terms in a document). In our recent work [Zhu, M., Shi, S., Li, M., & Wen, J. (2007). Effective top-k computation in retrieving structured documents with term-proximity support. In Proceedings of 16th CIKM conference (pp. 771–780)], we demonstrated that, when term-proximity is incorporated into ranking functions, most existing index structures and top-k strategies become quite inefficient. To solve this problem, we built the inverted index based on web page structure and proposed the query processing strategies accordingly. The experimental results indicate that the proposed index structures and query processing strategies significantly improve the top-k efficiency. In this paper, we study the possibility of adopting additional techniques to further improve top-k computation efficiency. We propose a Proximity-Probe Heuristic to make our top-k algorithms more efficient. We also test the efficiency of our approaches on various settings (linear or non-linear ranking functions, exact or approximate top-k processing, etc.).


Image annotation via graph learning

February 2009

·

106 Reads

·

208 Citations

Pattern Recognition

Image annotation has been an active research topic in recent years due to its potential impact on both image understanding and web image search. In this paper, we propose a graph learning framework for image annotation. First, the image-based graph learning is performed to obtain the candidate annotations for each image. In order to capture the complex distribution of image data, we propose a Nearest Spanning Chain (NSC) method to construct the image-based graph, whose edge-weights are derived from the chain-wise statistical information instead of the traditional pairwise similarities. Second, the word-based graph learning is developed to refine the relationships between images and words to get final annotations for each image. To enrich the representation of the word-based graph, we design two types of word correlations based on web search results besides the word co-occurrence in the training set. The effectiveness of the proposed solution is demonstrated from the experiments on the Corel dataset and a web image dataset.


Scale-Invariant Visual Language Modeling for Object Categorization

February 2009

·

32 Reads

·

51 Citations

IEEE Transactions on Multimedia

In recent years, ldquobag-of-wordsrdquo models, which treat an image as a collection of unordered visual words, have been widely applied in the multimedia and computer vision fields. However, their ignorance of the spatial structure among visual words makes them indiscriminative for objects with similar word frequencies but different word spatial distributions. In this paper, we propose a visual language modeling method (VLM), which incorporates the spatial context of the local appearance features into the statistical language model. To represent the object categories, models with different orders of statistical dependencies have been exploited. In addition, the multilayer extension to the VLM makes it more resistant to scale variations of objects. The model is effective and applicable to large scale image categorization. We train scale invariant visual language models based on the images which are grouped by Flickr tags, and use these models for object categorization. Experimental results show they achieve better performance than single layer visual language models and ldquobag-of-wordsrdquo models. They also achieve comparable performance with 2-D MHMM and SVM-based methods, while costing much less computational time.


Multi-graph similarity reinforcement for image annotation refinement

November 2008

·

34 Reads

·

13 Citations

Proceedings / ICIP ... International Conference on Image Processing

In image annotation refinement, word correlations among candidate annotations are used to reserve high relevant words and remove irrelevant words. Existing methods build word correlations on textual annotations of images. In this paper, visual contents of images are utilized to explore better word correlations by using multi-graph similarity reinforcement method. Firstly, image visual similarity graph and word correlations graph are built respectively. Secondly, the two graphs are iteratively reinforced by each other through image-word transfer matrix. Once the two graphs converge to steady states, the new word correlations graph is used to refine the candidate annotations. The experiments show that our method performs better than method not considering visual content of images.


Citations (82)


... In [23], for updating the weight of each feature component, two parameters are calculated: the mean distance between relevant images and the query vector, 1 D , and the mean distance between total feedback images (relevant and irrelevant images) and the query vector on that component, 2 D . To update the weight of each feature component, If 1 D is smaller than 2 D , it is assumed that the component is effective. ...

Reference:

A short-term learning approach based on similarity refinement in content-based image retrieval
Alternating Feature Spaces in Relevance Feedback

... In (Ruofei Zhang et al., 2006), a probabilistic semantic model was proposed, in which visual features and textual words are connected via a hidden layer, creating the semantic concepts to be discovered to harness the modalities' synergy explicitly. The association of visual elements and textual terms is decided in a Bayesian framework, allowing for confidence in the association. ...

A probabilistic semantic model for image annotation and multi-modal image retrieval

Multimedia Systems

... Previous works proposed for motif extraction can be grouped broadly into two categories: the local feature-based approach [6,13,14,15,16,17,18,19,23,25], and the global structure-based approach [1,2,3,7,8,9,20,21,22]. ...

Automatic Peak Number Detection in Image Symmetry Analysis

Lecture Notes in Computer Science

... In particular, it improves the best performance (59.05%) obtained so far by POP [23] (1 iteration) by more than 4%. Concerning a comparison with post-ranking models for generic image search and retrieval like [40,9], the proposed solution performs better since such methods, as shown in [23], achieve worse results than POP [23]. More interestingly, notice that SB [2] and CCRR [17] provide the performance achieved using the KISSME baseline model. ...

Pseudo Relevance Feedback Based on Iterative Probabilistic One-Class SVMs in Web Image Retrieval

Lecture Notes in Computer Science

... Pada saat pengambilan gambar, ada berbagai angle yang dapat ditangkap untuk menghasilkan kesan gambar yang diinginkan. Selain itu lamanya pencahayaan yang ditangkap oleh kamera digital juga bisa menghasilkan gambar-gambar unik sesuai kreativitas dari fotografernya (Nisa et al., 2023;Tong et al., 2004). ...

Classification of Digital Photos Taken by Photographers or Home Users

Lecture Notes in Computer Science

... • To ensure that high tf for a relevant term in a document does not place that document ahead of other documents which have multiple relevant terms but with lower tf values, the logarithmic tf-factor (1 + ln ( 1 + ln(tf) ) ) by referring to Singhal et al. (1999) and Singhal and Kaszkiel (2001) or the BM25 tf-factor (tf/(k + tf) for some k > 0) by referring to Robertson and Zaragoza (2009) and Robertson et al. (1995) or the sigmoid tf-factor (1/(1 + e −tf )) by referring to Yao et al. (2006) can be introduced, since these functions increase slowly wrt tf, not like raw tf. ...

Ranking Web News Via Homepage Visual Layout and Cross-Site Voting
  • Citing Conference Paper
  • March 2006

Lecture Notes in Computer Science

... This is because the concepts cannot be fully represented by visual features. The other shortcoming is the CBIR premise; an example image must be available for the user, while in ABIR a user can simply compose queries using natural language (Inoue 2004). ABIR can itself be divided into two parts, Automatic Image Annotation (AIA) and query processing (Hidajat 2015). ...

Learning in hidden annotation-based image retrieval

... Image classification is the technical term used to describe this process, and it can be likened to categorical data. Previous studies include the classification of images as ads or non-ads (Jain, Taneja, and Taneja 2024;Li et al. 2007;Villegas, Goanta, and Aletras 2023), the classification of the display of online ads as clear or unclear (Vo, Tran, and Le 2017), and the classification of the ads as genuine or fake (Zaheer et al. 2022). Image classification is also used to classify physical activities, Lululemon-style clothing, and pets to examine visual congruence in influencer marketing and brand engagement (Argyris et al. 2020), food images for social media engagement prediction (Philp, Jacobson, and Pancer 2022), and brand-related images and their match with Instagram influencers (Sweet, Rothwell, and Luo 2019). ...

On Detection of Advertising Images
  • Citing Conference Paper
  • August 2007

... This approach has been less explored, although some existing methods have shown promising results (e.g. [3,9,24,32]). This paper discusses an unsupervised way to infer the user interests over numerical and multi-valued categorical attributes, which observes the user interaction and does not require any explicit information from him/her [17]. ...

Using Implicit Relevane Feedback to Advance Web Image Search
  • Citing Conference Paper
  • August 2006