About
202
Publications
40,003
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,770
Citations
Current institution
Publications
Publications (202)
Sketch matching is the fundamental problem in sketch based interfaces. After years of study, it remains challenging when there exists large irregularity and variations in the hand drawn sketch shapes. While most existing works exploit topology relations and graph representations for this problem, they are usually limited by the coarse topology expl...
Sentiment analysis of such opinionated online texts as reviews and comments has received increasingly close attention, yet most of the work is intended to deal with the detection of authors' emotion. In contrast, this article presents our study of the social emotion detection problem, the objective of which is to identify the evoked emotions of rea...
Social emotion detection of online users has become an important task for mining public opinions. Social emotion detection aims at predicting the readers’ emotions evoked by news articles, tweets, etc. In this article, we focus on building a social emotion detection system for online news. The system is built based on the modules of document select...
Text plays a dominant role in video viewing and understanding as text carries rich and important information relevant to the video contents. Studies have shown that humans often pay first attention to text over other objects in a video as text helps in getting semantics relevant to the content of the video. With this in mind, this chapter introduce...
This chapter discusses video text recognition involving multiple scripts. While most video text recognition works are based on English due to much greater availability of English video datasets, there have been increasing interests in recent years in recognizing video text of other languages and scripts. In this context, this chapter first presents...
This chapter introduces methods for text binarization and recognition as post-processing for text detection. Binarization pertains to the separation of text from the background of the detected text block. As an example of binarization methods, this chapter presents a fusion method which combines wavelet and gradient bands for the text lines with th...
This chapter presents methods for character segmentation from text lines and recognition of video characters. It is noted that character segmentation from video text lines detected by video text detection method is not as easy as segmenting characters from scanned document images due to low resolution and complex background of video. This chapter p...
Video contains two types of texts. The first type pertains to caption texts which are edited texts or graphics texts artificially superimposed into video and are relevant to the content of the video. The second type belongs to scene texts, which are naturally existing texts, usually embedded in objects in the video. This chapter focuses on the stat...
This chapter presents state-of-the-art work in the area of performance evaluation of video text detection and recognition algorithms and systems. It first introduces the three components which a performance evaluation protocol may comprise of, namely, a benchmarking database, a matching method, and a set of performance metrics. Each of these three...
Most video streams involve more than one modality for conveying hints related to the nature of the underlying contents. In general, video data compose of three low-level modalities, namely, the visual modality (i.e., visual objects, motions, and scene changes), the auditory modality which can be structural foreground or unstructured background soun...
Nowadays, a large number of video text detection systems have been developed for daily used video applications such as transportation surveillance, electronic payment, traffic safety detection, sport videos retrieval, and even commercial online advertisements, in which the existing closed-circuit television, road-rule enforcement cameras, or online...
Extracting texts from video always faces variations in font style, size, color, orientation, and brightness; thus, video preprocessing techniques are required to reduce the complexity of the succeeding steps consisting of video text detection, localization, segmentation, recognition, and script identification. This chapter gives a brief overview of...
Text in video contains valuable information and is exploited in many content-based video applications. However, scene text detection has not been systematically explored even people have developed a lot of optical character recognition (OCR) techniques in the past decades. This chapter gives an introduction to the current progress on scene text det...
The rapid development of social media services has been a great boon for the communication of emotions through blogs, microblogs/tweets, instant-messaging tools, news portals, and so forth. This paper is concerned with the detection of emotions evoked in a reader by social media. Compared to classical sentiment analysis conducted from the writer's...
The rapid development of social media services has been a great boon for the communication of emotions through blogs, microblogs/tweets, instant-messaging tools, news portals, and so forth. This paper is concerned with the detection of emotions evoked in a reader by social media. Compared to classical sentiment analysis conducted from the writer’s...
Virtualization is one of the key enablers in cloud computing. At the same time, though, it is also widely considered as a double-edged sword that may cause information leakage between virtual machines (VM) co-residing on the same physical server via various cross-VM covert channels. In this paper, we first explore the impact of different bystander...
Video text detection provides an efficient approach to the indexing, classification, retrieval and understanding of visual content.
This unique text/reference presents a systematic introduction to the latest developments in video text detection. Opening with a discussion of the underlying theory and a brief history of video text detection, the text...
This paper proposes a new non-reference image quality metric that can be adopted by the state-of-the-art image/ video denoising algorithms for auto-denoising. The proposed metric is extremely simple and can be implemented in four lines of Matlab code. The basic assumption employed by the proposed metric is that the noise should be independent of th...
Extreme learning machine (ELM) for single-hidden-layer feedforward neural networks (SLFN) is a powerful machine learning technique, and has been attracting attentions for its fast learning speed and good generalization performance. Recently, a weighted ELM is proposed to deal with data with imbalanced class distribution. The key essence of weighted...
Increasingly, software engineering involves open systems consisting of autonomous and heterogeneous participants or agents who carry out loosely coupled interactions. Accordingly, understanding and specifying communications among agents ...
A strategy of automatic answer retrieval for repeated or similar questions in user-interactive systems by employing semantic question patterns is proposed in this paper. The used semantic question pattern is a generalized representation of a group of questions with both similar structure and relevant semantics. Specifically, it consists of semantic...
In this paper, we present four image descriptors for HEp-2 cell staining patterns classification, including LBP, Gabor, DCT, and a global appearance statistical descriptor. A multiclass boosting SVM algorithm is proposed to integrate these descriptors together: (1) within each boosting round, four multiclass posterior probability SVMs are trained c...
Community question answering (QA) has become increasingly popular and received a great variety of questions every day. Among them, some questions are very attractive and popular to many users, while some other questions are very tedious and unattractive. In this paper, we aim to identify popular questions in the community QA through modeling questi...
In many industries (such as retailing or consumer services), choosing an appropriate site is one of the most important decisions for firms. This study proposes a graph-based method to address the business site selection problem from a perspective of “intraspecific competition,” which takes into account the fact that most business firms are not isol...
In this paper, we propose a generative model, the Topic-based User Interest (TUI) model, to capture the user interest in the User-Interactive Question Answering (UIQA) systems. Specifically, our method aims to model the user interest in the UIQA systems with latent topic method, and extract interests for users by mining the questions they asked, th...
Phishing attacks are growing in both volume and sophistication. The antiphishing method described here collects webpages with either a direct or indirect association with a given suspicious webpage. This enables the discovery of a webpage's so-called "parasitic" community and then ultimately its phishing target — that is, the page with the stronges...
There have been increasing interests in Community Question Answering (CQA) recently. CQA websites such as Yahoo! Answers and Baidu Knows are increasingly popular, attracting tens of thousands of users to submit questions and answers every day. However, we find that there is a gap in the study of what kinds of questions are more likely to attract an...
Sentiment analysis of online documents such as news articles, blogs and microblogs has received increasing attention. We propose an efficient method of automatically building the word-emotion mapping dictionary for social emotion detection. In the dictionary, each word is associated with the distribution on a series of human emotions. In addition,...
As more web services that implement core functions of business are delivered to customers with service charges, an open and competitive business web services market must be established. However, the qualities of these business web services are unknown without real experiences and users are unable to make decisions on service selection. To address t...
The longitudinal impact of online consumer reviews on hotel sales is studied. Different from previous efforts using pure ratings, this study explores the extraction and representation of consumer sentiments within review comments in an iterative approach. Initially, we define a few primary attributes as seeds to mine review patterns which are frequ...
A strategy of automatic answer retrieval for repeated or similar questions in user-interactive systems by employing semantic question patterns is proposed in this paper. The used semantic question pattern is a generalized representation of a group of questions with both similar structure and relevant semantics. Specifically, it consists of semantic...
An automatic method for answering repeated questions based on semantic question patterns is proposed in this paper. The semantic question pattern used is a generalized representation of a group of questions with both similar structure and relevant semantics. Specifically, it consists of semantic annotations (or constraints) for the variable compone...
A novel framework using a Bayesian approach for content-based phishing web page detection is presented. Our model takes into account textual and visual contents to measure the similarity between the protected web page and suspicious web pages. A text classifier, an image classifier, and an algorithm fusing the results from classifiers are introduce...
A new clustering strategy, TermCut, is presented to cluster short text snippets by finding core terms in the corpus. We model
the collection of short text snippets as a graph in which each vertex represents a piece of short text snippet and each weighted
edge between two vertices measures the relationship between the two vertices. TermCut is then a...
Question categorization, which suggests one of a set of predefined categories to a user’s question according to the question’s topic or content, is a useful technique in user-interactive question answering systems. In this paper, we propose an automatic method for question categorization in a user-interactive question answering system. This method...
We study the anti-phishing problem and propose a method for automatically discovering the phishing target of any given suspicious webpage. The method first collects all associated webpages, which have either direct association relationship or indirect association relationship with the given webpage, and then finds the so-called “parasitic” communit...
A new business—insurance on business Web services—is proposed. As more and more Web services will be developed to fulfill the ever increasing needs of e-Business, the e-marketplace for Web services will soon be established. However, the qualities of these business Web services are unknown without real experiences and users can hardly make decisions...
In this paper, we propose a new method for measuring the similarity between two short text snippets by comparing each of them
with the probabilistic topics. Specifically, our method starts by firstly finding the distinguishing terms between the two
short text snippets and comparing them with a series of probabilistic topics, extracted by Gibbs samp...
Asset management has long been considered as an important issue in enterprises. With the maturing of the Radio Frequency Identification
(RFID) technology, automated management of assets (particularly for mobile ones) in an enterprise using RFID becomes practical.
We have developed a RFID-based Asset Management System (RAMS) for a large telecommuni...
A novel modeling method for a collection of short text snippets is presented in this paper to measure the similarity between pairs of snippets. The method takes account of both the semantic and statistical information within the short text snippets, and consists of three steps. Given a set of raw short text snippets, it first establishes the initia...
An approach to identification of the phishing target of a given (suspicious) webpage is proposed by clustering the webpage set consisting of its all associated webpages and the given webpage itself. We first find its associated webpages, and then explore their relationships to the given webpage as their features for clustering. Such relationships i...
Visual spoofing in Unicode-based text is anticipated as a severe web security problem in the near future as more and more Unicode-based web documents will be used. In this paper, to detect whether a suspicious Unicode character in a word is visual spoofing or not, the context of the suspicious character is utilized by employing a Bayesian framework...
Many questions submitted to Collaborative Question Answering (CQA) sites have been answered before. We propose an approach to automatically generating an answer to such questions based on automatically learning to identify "equivalent" questions. Our main contribution is an unsupervised method for automatically learning question equivalence pattern...
Semantic annotation for text is a well-studied topic. However, little contribution has been engaged in the application of
short text annotation. In this article, an automatic annotation approach is proposed for such purpose, which annotates short
text with semantic labels for question answering systems. In the first step, keywords are extracted fro...
An approach to the discovery of the phishing target of a suspicious webpage is proposed, which is based on construction and reasoning of the Semantic Link Network (SLN) of the suspicious webpage. The SLN is constructed from the given suspicious webpage and its associated webpages. Since reasoning of the SLN can discover implicit relations among web...
An automatic annotation method for annotating text with semantic labels is proposed for question answering systems. The approach first extracts the keywords from a given question. Semantic label selection module is then employed to select the semantic labels to tag keywords. In order to distinguish multi-senses and assigns best semantic labels, a B...
A personalized e-learning framework based on a user-interactive question-answering (QA) system is proposed, in which a user-modeling approach is used to capture personal information of students and a personalizedanswer extraction algorithm is proposed for personalizedautomatic answering. In our approach, a topic ontology (or concept hierarchy) of c...
An automatic method for building a semantic dictionary from existing questions in a pattern-based question answering system is proposed for question categorization. This dictionary consists of two main parts: Semantic Domain Terms (SDT), which is a domain specific term list, and Semantic Labeled Terms (SLT), which contain common terms tagged with s...
In this paper, we propose a multimedia information retrieval frame-work in distributed networks, which is suitable for both cooperative and non-cooperative environments and resistant to biased content summaries. The relations of peers are established and evolved according to their historical interactions, which are computed with a peer reputation m...
A user-interactive question-answering (QA) platform named BuyAns (at www.buyans.com) is presented. The platform is a special kind of online community and mainly features a rewarding scheme for answering questions
among all users, a pattern-based user interface (UI) for questioning and answering, and a pattern-based representation and
storage scheme...
Question categorization, which automatically suggests a few categories to host a user’s question, is a useful technique in
Web-based question answering systems. In this paper, we propose a question categorization method which makes use of user feedback
to the system’s automatic suggestions to improve question categorization. We initialize the cate...
An automatic method for question translation based on semantic pattern is proposed in this paper, in which structure analysis, pattern matching and word sense selection are three important steps. An evaluation method is also presented to calculate the similarities between the original words and the generated words to obtain better semantic translat...
7th International Workshop, GREC 2007, Selected Papers, LNCS 5046, ISBN 978-3-540-88184-1
With the maturing of the radio frequency identification (RFID) technology, automated management of assets (particularly for mobile ones) in an enterprise using RFID becomes practical. This paper proposes a RFID-based asset management system (RAMS) and details how to maintain the whole life-cycle of assets from their acquisition, transfer, maintenan...
A personalized e-learning framework based on a user-interactive question-answering (QA) system is proposed, in which a user-modeling approach is used to capture personal information of students and a personalized answer extraction algorithm is proposed for personalized automatic answering. In our approach, a topic ontology (or concept hierarchy) of...
Abstract Anew semantic pattern is proposed in this paper, which can be used ,by users,to post questions and an- swers in user-interactive question answering (QA) sys- tem. The necessary procedures,of using ,semantic pat- tern in a QA system are also presented, which include question structure analysis, pattern matching, pattern generation, pattern...
Unicode has become a useful tool for information internationalization, particularly for applications in web links, web pages,
and emails. However, many Unicode glyphs look so similar that malicious guys may utilize this feature to trick people’s eyes.
In this paper, we propose to use Unicode string coloring as a promising countermeasure to this eme...
A half-day single track workshop is designed to gather academic researchers and industrial practitioners at to share ideas and knowledge of know-how, and to discuss all relevant issues including the business models, enabling technologies, and killer applications, of Web-based question answering (QA), especially, the user-interactive QA services and...
A balanced question recommendation mechanism for user-interactive question answering (QA) systems is proposed to automatically recommend a new question to suitable users to answer. In this mechanism, a user modeling method is used to estimate the interests and professional areas of each user so that we can choose suitable users to answer a given qu...
In this paper, we propose an asset management system using technologies of Radio Frequency Identification (RFID), Web Geographic Information System (WebGIS) and Short Message Service (SMS) for intelligent management of assets. The proposed method tracks and monitors the changes of locations of the assets and shows their statuses on a geographical m...
A method of learning adaptation rules for case- based reasoning (CBR) is proposed in this paper. Adaptation rules are generated from the case-base with the guidance of domain knowledge which is also extracted from the case-base. The adaptation rules are refined before they are applied in the revision process. After solving each new problem, the ada...
In this paper, we retrieve exact answer from a question/answer pair based on semantic pattern and dependency matching. Question target and the weight information of main phrase are gained from semantic pattern. Candidate exact answers to a question are retrieved from its corresponding sentence-form answer by dependency relation matching. A probabil...
Frequently asked question (FAQ) answering is a very useful module in automatic question answering (QA) systems where calculation of question similarity is a key problem. In this paper, we propose a new method for measuring the similarity between users' questions and the questions in a FAQ database. Both statistic measure and semantic information ar...
We propose an approach to automatic clinical question answering based on UMLS relations. Rules are defined to identify the medical concepts and their relations in the questions and documents with the help of Metamap transfer and SemRep. The phrase-level answers are generated through matching concepts and relations between question and documents. Ex...
This paper proposes four models to manually acquire procedural and declarative knowledge based on procedural and declarative knowledge acquisition language (PDKAL). The method of transforming PDKAL to object language (OL) is also introduced for inference in the acquired knowledge base. Preliminary experiments show that the four models are this meth...
This paper presents a novel syntactic symbol recognition approach to the vector based symbol recognition problem. Different from existing syntactic approaches, which usually describe the geometric relations among primitives, our method formulates a new model to describe the geometric information of a primitive with respect to the whole symbol objec...
In this paper, we propose a user reputation model and apply it to a user-interactive question answering system. It combines the social network analysis approach and the user rating approach. Social network analysis is applied to analyze the impact of participant users' relations to their reputations. User rating is used to acquire direct judgment o...
In this paper, we propose a novel descriptor based on symbol signatures for symbol filtering and recognition. First of all, all symbols are assumed in vectorial forms. All the primitive-pair relationships in a symbol are recorded and employed to create the signature representing the sym- bol. Although the approach aims at discriminating the sym- bo...
Visual similarity evaluation plays an important role in intelligent graphics system. A basic problem of it is how to extract the content information of an image and how to describe the information with an intermediate representation, namely, image representation, because the image representation has great influence on the efficiency and performance...
An interactive example-driven approach to graphics recognition in engineering drawings is proposed. The scenario is that the
user first interactively provides an example of a graphic object; the system instantly learns its graphical knowledge and
uses the acquired knowledge to recognize the same type of graphic objects. The proposed approach repres...
We investigate the effectiveness of lexical, topic and structural similarities on the semantic relevance between a question and a passage which may contain the answer. We propose a web-based method to measure the lexical similarity between a question and a passage based on the semantic similarity between words or phrases. The topic similarity betwe...
The efficacy of an information extraction system is mostly determined by the quality of the extraction rules. Building these extraction rules is time-consuming and difficult to implement by hand. Hence, we propose a Heuristic Rule Learning (HRL) algorithm which can automatically and efficiently acquire high-quality extraction rules from a user labe...
We propose a novel approach to similarity assessment for graphic symbols. Symbols are represented as 2D kernel densities and their similarity is measured by the Kullback-Leibler divergence. Symbol orientation is found by gradient-based angle searching or independent component analysis. Experimental results show the outstanding performance of this a...
A new semantic pattern is proposed in this paper, which can be used by users to post questions and answers in user-interactive question answering (QA) system. The necessary procedures of using semantic pattern in a QA system are also presented, which include question structure analysis, pattern matching, pattern generation, pattern classification a...
An effective approach to phishing Web page detection is proposed, which uses Earth mover's distance (EMD) to measure Web page visual similarity. We first convert the involved Web pages into low resolution images and then use color and coordinate features to represent the image signatures. We use EMD to calculate the signature distances of the image...
A method for capturing the interest and authority of students about course content is proposed and implemented as a user modeling
approach in a Web-based user-interactive question-answering (QA) system. An instructor has to define a topic ontology (or
concept hierarchy) for the course content so that the system can generate the corresponding struct...
We survey the methods developed up to date for crude vectorization of document images. We classify them into six categories: thinning based, Hough Transform based, contour-based, run-graph based, mesh-pattern based, and sparse pixel based. The crude vectorization is a relatively mature subject in the Document Analysis and Recognition field, though...
We anticipate the widespread usage of an internationalized resource identifier (IRI) or internationalized domain name (IDN) on the web as complement to universal resource identifier (URI). IRI/IDN is composed of characters in a subset of Unicode, such that a Unicode attack to IRI/IDN could happen. Hence, visually or semantically, certain phishing I...
The authors' proposed antiphishing strategy uses visual characteristics to identify potential phishing sites and measure suspicious pages similarity to actual sites registered with the system. The first of two sequential processes in the SiteWatcher system runs on local email servers and monitors emails for keywords and suspicious URLs. The second...
Unicode is becoming a dominant character representation format for information processing. This presents a very dangerous usability and security problem for many applications. The problem arises because many characters in the UCS (Universal Character Set) are visually and/or semantically similar to each other. This presents a mechanism for maliciou...
The need of answer clustering and fusion in a user- interactive question answering (QA) system is identified and its user interface and enabling technology are presented in this paper. This function aims to help a user to effi- ciently browse all the answers and find the correct answer to a specific question by clustering answers into groups and pr...
The efficacy of pattern-based question answering system is mostly determined by the size of the semantic pattern base and the expression capability of the semantic patterns. We find that the expression capabilities of semantic patterns are determined by their instantiation degrees. Hence, we propose an evaluation strategy named Semantic Identifiabi...
A severe potential security problem in utilization of Unicode in the Web is identified, which is resulted from the fact that there are many similar characters in the Unicode Character Set (UCS). The foundation of our solution relies on evaluating the similarity of characters in UCS. We develop a solution bsed on the renowned Kernel Density Estimati...
We anticipate a potential phishing strategy by obfuscation of Web links using Internationalized Resource Identifier (IRI). In the IRI scheme, the glyphs of many characters look very similar while their Unicodes are different. Hence, certain different IRIs may show high similarity. The potential phishing attacks based on this strategy are very likel...
In this paper, a novel, adaptive noise reduction method for engineering drawings is proposed based on assessment of both primitives and noise. Unlike the current approaches, our method takes into account the special features of engineering drawings and assesses the characteristics of primitives and noise such that adaptive procedures and parameters...
In this paper, we present an integrated system for symbol recognition. The whole recognition procedure consists of image compres- sion, denoising and recognition. We present a pixel-based method to calculate similarity between two symbols using the bipartite transforma- tion distance after they are aligned by their angular distributions. The propos...