Sheng Gao

Sheng Gao
Institute for Infocomm Research

About

51
Publications
7,145
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
855
Citations
Introduction
Sheng Gao currently works at Institute for Infocomm Research. Sheng does research in Artificial Intelligence, Data Mining and Information Science. Their most recent publication is 'Octave-dependent Probabilistic Latent Semantic Analysis to Chorus Detection of Popular Song'.

Publications

Publications (51)
Conference Paper
Content representation of music signal is an essential part of music information retrieval applications, e.g. chorus detection, genre classification, etc. In the paper, we propose the octave-dependent probabilistic latent semantic analysis (OdPlsa) to discover the latent audio patterns (or clusters) through spectral-temporal analysis. Then the audi...
Chapter
Most previous works on opinion summarization focus on summarizing sentiment polarity distribution toward different aspects of an entity (e.g., battery life and screen of a mobile phone). However, users’ demand may be more beyond this kind of opinion summarization. Besides such coarse-grained summarization on aspects, one may prefer to read detailed...
Article
Replay, which is to playback a pre-recorded speech sample, presents a genuine risk to automatic speaker verification technology. In this study, we evaluate the vulnerability of text-dependent speaker verification systems under the replay attack using a standard benchmarking database, and also propose an anti-spoofing technique to safeguard the spea...
Conference Paper
Full-text available
In this paper, we propose a new framework for opinion summarization based on sentence selection. Our goal is to assist users to get helpful opinion suggestions from reviews by only reading a short summary with few informative sentences, where the quality of summary is evaluated in terms of both aspect coverage and viewpoints preservation. More spec...
Conference Paper
Sentiment classification is becoming attractive in recent years because of its potential commercial applications. It exploits supervised learning methods to learn the classifiers from the annotated training documents. The challenge in sentiment classification lies in that the sentiment domains are diverse, heterogeneous and fast-growing. The classi...
Conference Paper
In this paper we present a learning algorithm to estimate a risk-sensitive and document-relation embedded ranking function so that the ranking score can reflect both the query-document relevance degree and the risk of estimating relevance when the document relation is considered. With proper assumptions, an analytic form of the ranking function is...
Conference Paper
Full-text available
To have a robust and informative image content representation for image categorization, we often need to extract as many as possible visual features at various locations, scales and orientations. Thus it is not surprised that an image has a few hundreds or even thousands of visual descriptors. This raises huge cost of computation and memory. To eli...
Article
A generalized discriminative multiple instance learning (GDMIL) algorithm is presented to train the classifier in the condition of vague annotation of training samples GDMIL not only inherits the original MIL's capability of automatically weighting the instances in the bag according to their relevance to the concept but also integrates generative m...
Conference Paper
The task of ad hoc photographic image retrieval in ImageCLEF 2007 international benchmark is to retrieve relevant images in the database to the user query formulated as keywords and image examples. This paper presents rich representation and indexing technologies exploited in our system that participated in ImageCLEF 2007. It uses diverse visual co...
Article
This paper introduces the IPAL participation at CLEF 2008 on the new TEL collec- tion and on the ad-hoc photographic retrieval ImageClef. Following the changes in evaluation criterion this year in ImageClef, i.e.promoting diversity in the top ranked images, we have integrated the novelty measure in our similarity based system devel- oped in ImageCL...
Article
Full-text available
In this paper, a kernel-based learning algorithm, kernel rank, is presented for improving the performance of semantic concept detection. By designing a classifier optimizing the receiver operating characteristic (ROC) curve using kernel rank, we provide a generic framework to optimize any differentiable ranking function using effective smoothing fu...
Conference Paper
Full-text available
Recently, the bag-of-words approach has been successfully applied to automatic image annotation, object recognition, etc. The method needs to first quantize an image using the visual terms and then extract the image-level statistics for classification. Although successful applications have been reported, it lacks the capability to model the spatial...
Conference Paper
Full-text available
In the paper we study the efficiency of semantic concept association in multimedia semantic concept detection. We present an approach to automatically learn from the corpus the association strength between pair-wise semantic concepts. We discuss two usages of association strength: 1) applying positive concepts with high association strength for sel...
Conference Paper
Full-text available
Given rich content-based features of multimedia (e.g., visual, text, or audio) followed by various detectors (e.g., SVM, Adaboost, HMM or GMM, etc), can we find an efficient approach to combine these evidences? In the paper, we address this issue by proposing an Integrated Statistical Model (ISM) to combine diverse evidences extracted from the doma...
Article
Full-text available
This paper presents IPAL ad-hoc photographic retrieval and medical image retrieval results in the ImageClef 2007 campaign. For the photo task, IPAL group is ranked at the 3rd place among 20 participants. The MAP of our best run is 0.2833, which is ranked at the 6th place among the 476 runs. The IPAL system is based on the mixed modality search, i.e...
Conference Paper
Full-text available
The bag-of-words approach has become increasingly attractive in the fields of object category recognition and scene classification, witnessed by some successful applications [5, 7, 11]. Its basic idea is to quantize an image using visual terms and exploit the image-level statistics for classification. However, the previous work still lacks the capa...
Conference Paper
Automatic music genre classification is one of the most challenging problems in music information retrieval and management of digital music database. In this paper, we propose a new framework using text category methods to classify music genres. This framework is different from current methods for music genre classification. In our framework, we co...
Conference Paper
Full-text available
In this paper, we present an AUC (i.e., the area under the curve of receiver operating characteristics (ROC)) maximization based learning algorithm to design the classifier for maximizing the ranking performance. The proposed approach trains the classifier by directly maximizing an objective function approximating the empirical AUC metric. Then the...
Conference Paper
Full-text available
We propose a new framework for automatic image annotation through multi-topic text categorization. Given a test image, it is first converted into a text document using a visual codebook learnt from a collection of training images. Latent semantic analysis is then performed on the tokenized document to extract a feature vector based on a visual lexi...
Article
Full-text available
We propose a maximal figure-of-merit learning (MFoM) approach for robust classifier design, which directly optimizes performance metrics of interest for different target classifiers. The proposed approach, embedding the decision functions of classifiers and performance metrics into the overall training objective, learns the parameters of classifier...
Conference Paper
An automatic synchronization system of the popular song and its lyrics is presented in the paper. The system includes two main components: a) automatically detecting vocal/non-vocal in the audio signal and b) automatically aligning the acoustic signal of the song with its lyric using speech recognition techniques and positioning the boundaries of t...
Conference Paper
Full-text available
An ensemble learning framework is proposed to optimize the receiver operating characteristic (ROC) curve corresponding to a given classifier. The proposed ensemble maximal figure-of-merit (E-MFoM) learning framework meets four key requirements desirable for ROC optimization, namely: (1) each classifier in the ensemble can be learned with any specif...
Conference Paper
With the proliferation of camera phones, new informa- tion retrieval applications will emerge. The image of a scene captured by a camera phone can be a query to a remote server to identify the scene and return relevant in- formation. But unconstrained scene identification is an open problem. In this paper, we propose a discriminative measure to ran...
Conference Paper
Full-text available
In the paper we present a generalized discriminative multiple instance learning algorithm (GD-MIL) for multimedia semantic concept detection. It combines the capability of the MIL for automatically weighting the instances in the bag according to their relevance to the positive and negative classes, the expressive power of generative models, and the...
Article
NUS and I2R joint participated in the high-level feature extraction and automated search task for TRECVID 2006. In both task, we only make use of the standard TRECVID available annotation results. For HLF task, we develop 2 methods to perform automated concept annotation: (a) fully machine learning approach using SVM, LDF and GMM; and (b) Bi-gram m...
Conference Paper
In this paper, two discriminative fusion schemes are proposed for automatic image annotation. One is the ensemble-pattern association based fusion and another is the model-based transformation. The fusion approaches are studied and evaluated in a unified framework for AIA based on the text representation of the image content and the MC MFoM learnin...
Article
In this paper, we propose a new method for music identification based on embedded hidden Markov model (EHMM). Differing from conventional HMM, the EHMM estimates the emission probability of its external HMM from the second, state specific HMM, which is referred as internal HMM. EHMM clusters the feature blocks with its external HMM and describes sp...
Conference Paper
Extracting the melody from polyphonic musical audio is a nontrivial research problem. This paper presents an approach for vocal melody extraction from dual channel Karaoke music audio. The extracted melody corresponds to the singing voice in the original performance channel, which can then be used for melody-based music retrieval. In the proposed t...
Conference Paper
In this paper, an HMM-embedded unsupervised learning approach is proposed to detect the music events by grouping the similar segments of the music signal. This approach can cluster the segments based on their similarity of the spectral as well as the temporal structures. This is not easily done for clustering with the traditional similarity measure...
Conference Paper
The key or the scale information of a piece of music provides important clues on its high level musical content, like harmonic and melodic context, which can be useful for music classification, retrieval or further content analysis. Researchers have previously addressed the issue of finding the key for symbolically encoded music (MIDI); however, ve...
Conference Paper
Full-text available
Musical signals are highly structured. Untrained listeners can capture some particular musical events from audio signals. Uncovering this structure and detecting musical events will benefit musical content analysis. This is known to be an unsolved problem. In this paper, an unsupervised learning approach is proposed to automatically infer some stru...
Conference Paper
This work presents a novel method for the automatic solmization of a melody, by which a melody (a sequence of MIDI notes) can be transcribed to sol-fa syllables (i.e., do, re, me, fa, sol, la, ti). Automatic solmization can assist in music skill training, music notation and content-based music retrieval. The proposed method is based on an approach...
Conference Paper
Full-text available
In beat tracking, a listener's experience of the tempo from a previous excerpt of a music piece is usually a good prediction of the tempo of the following excerpt in the same piece of music. A human being has this ability to adjust adaptively his or her tap to synchronize with the tempo of music. An adaptive learning approach, based on maximum a po...
Conference Paper
Full-text available
This paper proposes using composite shot models to represent and recognize short video clips, such as advertisement clips. The temporal constraints of the shots constituting the clip are represented by a directed graph, which takes the shots as the nodes. Because each shot is represented by a hidden Markov model, the whole clip representation becom...
Conference Paper
Full-text available
Classification of musical segments is an interesting problem. It is a key technology in the development of content-based audio document indexing and retrieval. In this paper, we apply the feature extraction and modeling techniques commonly used in automatic speech recognition to solving the problem of segmentation and instrument identification of m...
Conference Paper
Full-text available
In this paper a musical event based indexing approach is proposed and its application to content-based music identification is studied. The events, which function as term words used in text retrieval or basic speech units in speech recognition, are inferred using an unsupervised learning algorithm. Its differences with the existing methods are in t...
Conference Paper
Full-text available
In this paper a novel sports news video shot classification method has been proposed. First two features based on motion and color are constructed and extracted from video shots: play field color ratio for specific types of sports, background motion and consistency ratio, then they are combined to generate an 11-dimension shot feature to feed into...
Conference Paper
Full-text available
We propose a multiclass (MC) classification approach to text categorization (TC). To fully take advantage of both positive and negative training examples, a maximal figure-of-merit (MFoM) learning algorithm is introduced to train high performance MC classifiers. In contrast to conventional binary classification, the proposed MC scheme assigns a uni...
Conference Paper
Full-text available
A novel maximal figure-of-merit (MFoM) learning approach to text categorization is proposed. Different from the conventional techniques, the proposed MFoM method attempts to integrate any performance metric of interest (e.g. accuracy, recall, precision, or F1 measure) into the design of any classifier. The corresponding classifier parameters are le...
Article
Context-dependent acoustic model based on decision tree has been deeply investigated and applied in western language speech recognition. But in Mandarin speech recognition, diphone model was more popular and little attention was paid to triphone in the past. In this paper triphone model based on decision tree was proposed and some key problems were...
Conference Paper
This paper presents a comparative study on automatic speech recognition for two different Chinese dialects, namely Mandarin and Cantonese. It focuses on decision-tree based context-dependent acoustic modeling for large-vocabulary continuous speech recognition. Extensive phonological and phonetic knowledge are incorporated to design questions concer...
Conference Paper
Full-text available
We propose a novel Bayesian learning framework of hierarchical mixture model by incorporating prior hierarchical knowledge into concept representations of multi-level concept structures in images. Characterizing image concepts by mixture models is one of the most effective techniques in automatic image annotation (AIA) for concept-based image retri...
Article
This paper describes a new framework based on one-pass and decision tree based class-triphone acoustic modeling for Mandarin LVCSR. Compared with the multi-pass decoder, it should be more knowledgeable and efficient as all sources are used at the same time when the decoder could be well organized and optimized. We give a detail about the organizati...
Article
We participated in the high-level feature extraction and search task for TRECVID 2005. For the high-level feature extraction task, we make use of the available collaborative annotation results for training, and develop 2 methods to perform automated concept annotation: (a) a r anked-Maximal Figure -of-Merit (MFoM) method; and (b) a multimodal rankB...
Article
Full-text available
This paper describes the details of our systems for feature extraction and search tasks of TRECVID-2004. For feature extraction, we emphasize the use of visual auto-concept annotation technique, with the fusion of text and specialized detectors, to induce concepts in videos. For the search task, our emphasis is two-fold. First we employ query-speci...
Article
In the paper, we introduce our systems and methods to promote the diversity of the ad-hoc photog retrieval in ImageCLEF 2009. The image database in this year is quite different from the previous years, not only increasing the corpus size from 20,000 images to half millionm but also changing the domain from the travel to news. Most of queries are re...

Network

Cited By