Xiangfeng Luo

Shanghai University, Shanghai, Shanghai Shi, China

Are you Xiangfeng Luo?

Claim your profile

Publications (95)45.09 Total impact

  • Zheng Xu · Fenglin Zhi · Chen Liang · Lin Mei · Xiangfeng Luo
    [Show abstract] [Hide abstract]
    ABSTRACT: Image and video resources play an important role in traffic events analysis. With the rapid growth of the video surveillance devices, large number of image and video resources is increasing being created. It is crucial to explore, share, reuse, and link these multimedia resources for better organizing traffic events. Most of the video resources are currently annotated in an isolated way, which means that they lack semantic connections. Thus, providing the facilities for annotating these video resources is highly demanded. These facilities create the semantic connections among video resources and allow their metadata to be understood globally. Adopting semantic technologies, this paper introduces a video annotation platform. The platform enables user to semantically annotate video resources using vocabularies defined by traffic events ontologies. Moreover, the platform provides the search interface of annotated video resources. The result of initial development demonstrates the benefits of applying semantic technologies in the aspects of reusability, scalability and extensibility. Copyright © 2014, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
    No preview · Article · Jul 2016 · International Journal of Cognitive Informatics and Natural Intelligence
  • Zheng Xu · Yunhuai Liu · Neil Yen · Lin Mei · Xiangfeng Luo · Xiao Wei · Chuanping Hu

    No preview · Article · Jan 2016 · IEEE Transactions on Cloud Computing

  • No preview · Article · Dec 2015 · Future Generation Computer Systems
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Nonnegative Matrix Factorization (NMF) aims to factorize a matrix into two optimized nonnegative matrices appropriate for the intended applications. The method has been widely used for unsupervised learning tasks, including recommender systems (rating matrix of users by items) and document clustering (weighting matrix of papers by keywords). However, traditional NMF methods typically assume the number of latent factors (i.e., dimensionality of the loading matrices) to be fixed. This assumption makes them inflexible for many applications. In this paper, we propose a nonparametric NMF framework to mitigate this issue by using dependent Indian Buffet Processes (dIBP). In a nutshell, we apply a correlation function for the generation of two stick weights associated with each pair of columns of loading matrices, while still maintaining their respective marginal distribution specified by IBP. As a consequence, the generation of two loading matrices will be column-wise (indirectly) correlated. Under this same framework, two classes of correlation function are proposed (1) using Bivariate beta distribution and (2) using Copula function. Both methods allow us to adopt our work for various applications by flexibly choosing an appropriate parameter settings. Compared with the other state-of-the art approaches in this area, such as using Gaussian Process (GP)-based dIBP, our work is seen to be much more flexible in terms of allowing the two corresponding binary matrix columns to have greater variations in their non-zero entries. Our experiments on the real-world and synthetic datasets show that three proposed models perform well on the document clustering task comparing standard NMF without predefining the dimension for the factor matrices, and the Bivariate beta distribution-based and Copula-based models have better flexibility than the GP-based model.
    Full-text · Article · Jul 2015
  • Source
    Junyu Xuan · Jie Lu · Xiangfeng Luo · Guangquan Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: Nonnegative Matrix Factorization (NMF) aims to factorize a matrix into two optimized nonnegative matrices and has been widely used for unsupervised learning tasks such as product recommendation based on a rating matrix. However, although networks between nodes with the same nature exist, standard NMF overlooks them, e.g., the social network between users. This problem leads to comparatively low recommendation accuracy because these networks are also reflections of the nature of the nodes, such as the preferences of users in a social network. Also, social networks, as complex networks, have many different structures. Each structure is a composition of links between nodes and reflects the nature of nodes, so retaining the different network structures will lead to differences in recommendation performance. To investigate the impact of these network structures on the factorization, this paper proposes four multi-level network factorization algorithms based on the standard NMF, which integrates the vertical network (e.g., rating matrix) with the structures of horizontal network (e.g., user social network). These algorithms are carefully designed with corresponding convergence proofs to retain four desired network structures. Experiments on synthetic data show that the proposed algorithms are able to preserve the desired network structures as designed. Experiments on real-world data show that considering the horizontal networks improves the accuracy of document clustering and recommendation with standard NMF, and various structures show their differences in performance on these two tasks. These results can be directly used in document clustering and recommendation systems.
    Full-text · Article · Apr 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Traditional Relational Topic Models provide a way to discover the hidden topics from a document network. Many theoretical and practical tasks, such as dimensional reduction, document clustering, link prediction, benefit from this revealed knowledge. However, existing relational topic models are based on an assumption that the number of hidden topics is known in advance, and this is impractical in many real-world applications. Therefore, in order to relax this assumption, we propose a nonparametric relational topic model in this paper. Instead of using fixed-dimensional probability distributions in its generative model, we use stochastic processes. Specifically, a gamma process is assigned to each document, which represents the topic interest of this document. Although this method provides an elegant solution, it brings additional challenges when mathematically modeling the inherent network structure of typical document network, i.e., two spatially closer documents tend to have more similar topics. Furthermore, we require that the topics are shared by all the documents. In order to resolve these challenges, we use a subsampling strategy to assign each document a different gamma process from the global gamma process, and the subsampling probabilities of documents are assigned with a Markov Random Field constraint that inherits the document network structure. Through the designed posterior inference algorithm, we can discover the hidden topics and its number simultaneously. Experimental results on both synthetic and real-world network datasets demonstrate the capabilities of learning the hidden topics and, more importantly, the number of topics.
    Full-text · Article · Mar 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Incorporating the side information of text corpus, i.e., authors, time stamps, and emotional tags, into the traditional text mining models has gained significant interests in the area of information retrieval, statistical natural language processing, and machine learning. One branch of these works is the so-called Author Topic Model (ATM), which incorporates the authors's interests as side information into the classical topic model. However, the existing ATM needs to predefine the number of topics, which is difficult and inappropriate in many real-world settings. In this paper, we propose an Infinite Author Topic (IAT) model to resolve this issue. Instead of assigning a discrete probability on fixed number of topics, we use a stochastic process to determine the number of topics from the data itself. To be specific, we extend a gamma-negative binomial process to three levels in order to capture the author-document-keyword hierarchical structure. Furthermore, each document is assigned a mixed gamma process that accounts for the multi-author's contribution towards this document. An efficient Gibbs sampling inference algorithm with each conditional distribution being closed-form is developed for the IAT model. Experiments on several real-world datasets show the capabilities of our IAT model to learn the hidden topics, authors' interests on these topics and the number of topics simultaneously.
    Full-text · Article · Mar 2015
  • Xiao Wei · Xiangfeng Luo · Qing Li · Jun Zhang · Zheng Xu
    [Show abstract] [Hide abstract]
    ABSTRACT: Online comment has become a popular and efficient way for sellers to acquire feedback from customers and improve their service quality. However, some key issues need to be solved about evaluating and improving the hotel service quality based on online comments automatically, such as how to use the less trustworthy online comments, how to discover the quality defects from online comments, and how to recommend more feasible or economical evaluation indexes to improve the service quality based on online comments. To solve the above problems, this paper first improves fuzzy comprehensive evaluation (FCE) by importing trustworthy degree to it and proposes an automatic hotel service quality assessment method using the improved FCE, which can automatically get more trustworthy evaluation from a large amount of less trustworthy online comments. Then, the causal relations among evaluation indexes are mined from online comments to build the fuzzy cognitive map for the hotel service quality, which is useful to unfold the problematic areas of hotel service quality, and recommend more economical solutions to improving the service quality. Finally, both case studies and experiments are conducted to demonstrate that the proposed methods are effective in evaluating and improving the hotel service quality using online comments.
    No preview · Article · Feb 2015 · IEEE Transactions on Fuzzy Systems
  • Junyu Xuan · Jie Lu · Guangquan Zhang · Xiangfeng Luo
    [Show abstract] [Hide abstract]
    ABSTRACT: Graph mining has been a popular research area because of its numerous application scenarios. Many unstructured and structured data can be represented as graphs, such as, documents, chemical molecular structures, and images. However, an issue in relation to current research on graphs is that they cannot adequately discover the topics hidden in graph-structured data which can be beneficial for both the unsupervised learning and supervised learning of the graphs. Although topic models have proved to be very successful in discovering latent topics, the standard topic models cannot be directly applied to graph-structured data due to the "bag-of-word" assumption. In this paper, an innovative graph topic model (GTM) is proposed to address this issue, which uses Bernoulli distributions to model the edges between nodes in a graph. It can, therefore, make the edges in a graph contribute to latent topic discovery and further improve the accuracy of the supervised and unsupervised learning of graphs. The experimental results on two different types of graph datasets show that the proposed GTM outperforms the latent Dirichlet allocation on classification by using the unveiled topics of these two models to represent graphs.
    No preview · Article · Jan 2015 · Cybernetics, IEEE Transactions on
  • Junyu Xuan · Xiangfeng Luo · Guangquan Zhang · Jie Lu · Zheng Xu

    No preview · Article · Jan 2015 · IEEE Transactions on Systems, Man, and Cybernetics: Systems
  • Xiangfeng Luo · Jun Zhang · Qing Li · Xiao Wei · Lei Lu
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper advocates for a novel approach to recommend texts at various levels of difficulties based on a proposed method, the algebraic complexity of texts (ACT). Different from traditional complexity measures that mainly focus on surface features like the numbers of syllables per word, characters per word, or words per sentence, ACT draws from the perspective of human concept learning, which can reflect the complex semantic relations inside texts. To cope with the high cost of measuring ACT, the Degree-2 Hypothesis of ACT is proposed to reduce the measurement from unrestricted dimensions to three dimensions. Based on the principle of “mental anchor,” an extension of ACT and its general edition [denoted as extension of text algebraic complexity (EACT) and general extension of text algebraic complexity (GEACT)] are developed, which take keywords’ and association rules’ complexities into account. Finally, using the scores given by humans as a benchmark, we compare our proposed methods with linguistic models. The experimental results show the order GEACT>EACT>ACT> Linguistic models, which means GEACT performs the best, while linguistic models perform the worst. Additionally, GEACT with lower convex functions has the best ability in measuring the algebraic complexities of text understanding. It may also indicate that the human complexity curve tends to be a curve like lower convex function rather than linear functions.
    No preview · Article · Oct 2014 · IEEE Transactions on Human-Machine Systems
  • Junyu Xuan · Jie Lu · Guangquan Zhang · Xiangfeng Luo
    [Show abstract] [Hide abstract]
    ABSTRACT: Similarity measures are the foundations of many research areas, e.g. information retrieval, recommender system and machine learning algorithms. Promoted by these application scenarios, a number of similarity measures have been proposed and proposing. In these state-of-the-art measures, vector-based representation is widely accepted based on Vector Space Model (VSM) in which an object is represented as a vector composed of its features. Then, the similarity between two objects is evaluated by the operations on two corresponding vectors, like cosine, extended jaccard, extended dice and so on. However, there is an assumption that the features are independent of each others. This assumption is apparently unrealistic, and normally, there are relations between features, i.e. the co-occurrence relations between keywords in text mining area. In this paper, a space geometry-based method is proposed to extend the VSM from the orthogonal coordinate system (OVSM) to affine coordinate system (AVSM) and OVSM is proved to be a special case of AVSM. Unit coordinate vectors of AVSM are inferred by the relations between features which are considered as angles between these unit coordinate vectors. At last, five different similarity measures are extended from OVSM to AVSM using unit coordinate vectors of AVSM. Within the numerous application fields of similarity measures, the task of text clustering is selected to be the evaluation criterion. Documents are represented as vectors in OVSM and AVSM, respectively. The clustering results show that AVSM outweighs the OVSM.
    No preview · Article · Sep 2014
  • Yang Liu · Xiangfeng Luo · Junyu Xuan
    [Show abstract] [Hide abstract]
    ABSTRACT: Online hot event discovery has become a flourishing frontier where online document streams are monitored to discover newly occurring events or assigned to previously detected events. However, hot events have the nature to evolve, and their inherent topically related words are also likely to evolve. It makes event discovery a challenging task for traditional-mining approaches. Combining word association and semantic community, Association Link Network (ALN) organizes the loosely distributed associated resources. This paper presents an ALN-based novel online hot event discovery approach. Technically, this approach is enacted around three stages. In the first stage, we extract significant features to represent the content of each document from the online document stream. During the second stage, we classify the online document stream into topically related detected events considering event evolution in the form of ALN. At the third stage, we create an ALN-based event detection algorithm, which is used to timely discover newly occurring hot events. The online datasets used in our empirical studies are acquired from Baidu News, which spans a range of 1315 hot events and 236,300 documents. Experimental results demonstrate the hot events discovery ability with respect to high accuracy, good scalability, and short runtime. Copyright © 2014 John Wiley & Sons, Ltd.
    No preview · Article · Sep 2014 · Concurrency and Computation Practice and Experience
  • Chuanping Hu · Zheng Xu · Yunhuai Liu · Lin Mei · Lan Chen · Xiangfeng Luo
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent research shows that multimedia resources in the wild are growing at a staggering rate. The rapid increase number of multimedia resources has brought an urgent need to develop intelligent methods to organize and process them. In this paper, the semantic link network model is used for organizing multimedia resources. A whole model for generating the association relation between multimedia resources using semantic link network model is proposed. The definitions, modules, and mechanisms of the semantic link network are used in the proposed method. The integration between the semantic link network and multimedia resources provides a new prospect for organizing them with their semantics. The tags and the surrounding texts of multimedia resources are used to measure their semantic association. The hierarchical semantic of multimedia resources is defined by their annotated tags and surrounding texts. The semantics of tags and surrounding texts are different in the proposed framework. The modules of semantic link network model are implemented to measure association relations. A real data set including 100 thousand images with social tags from Flickr is used in our experiments. Two evaluation methods, including clustering and retrieval, are performed, which shows the proposed method can measure the semantic relatedness between Flickr images accurately and robustly.
    No preview · Article · Sep 2014 · IEEE Transactions on Emerging Topics in Computing
  • Zheng Xu · Fenglin Zhi · Chen Liang · Lin Mei · Xiangfeng Luo
    [Show abstract] [Hide abstract]
    ABSTRACT: Image and video resources play an important role in traffic events analysis. With the rapid growth of the video surveillance devices, large number of image and video resources is increasing being created. It is crucial to explore, share, reuse, and link these multimedia resources for better organizing traffic events. Most of the video resources are currently annotated in an isolated way, which means that they lack semantic connections. Thus, providing the facilities for annotating these video resources is highly demanded. These facilities create the semantic connections among video resources and allow their metadata to be understood globally. Adopting semantic technologies, this paper introduces a video annotation platform. The platform enables user to semantically annotate video resources using vocabularies defined by traffic events ontologies. Moreover, the platform provides the search interface of annotated video resources. The result of initial development demonstrates the benefits of applying semantic technologies in the aspects of reusability, scalability, and extensibility.
    No preview · Conference Paper · Aug 2014
  • Jun Zhang · Xiangfeng Luo · Lei Lu · Weidong Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: The acquisition of deep textual semantics is a key issue which significantly improves the performances of elearning, web search and web knowledge services, etc. Though many models have been developed to acquire textual semantics, the acquisition of deep textual semantics is still a challenge issue. Herein, an acquisition model of deep textual semantics is developed to enhance the capability of text understanding, which includes two parts: 1) how to obtain and organize the domain knowledge extracted from text set and 2) how to activate the domain knowledge for obtaining the deep textual semantics. The activation process involves the Gough mode reading theory, Landscape model and memory cognitive process. The Gough mode is the main human reading model that enables the authors to acquire deep semantics in a text readingprocess. Generalized semantic field is proposed to store the domain knowledge in the form of Long Term Memory (LTM). Specialized semantic field, which is acquired by the interaction process between the text fragment and the domain knowledge, is introduced to describe the change process of textual semantics. By their mutual actions, the authors can get the deep textual semantics which enhances the capability of text understanding; therefore, the machine can understand the text more precisely and correctly than those models only obtaining surface textual semantics.
    No preview · Article · Aug 2014 · International Journal of Cognitive Informatics and Natural Intelligence
  • Zheng Xu · Xiangfeng Luo · Shunxiang Zhang · Xiao Wei · Lin Mei · Chuanping Hu
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we study the problem of mining temporal semantic relations between entities. The goal of the studied problem is to mine and annotate a semantic relation with temporal, concise, and structured information, which can release the explicit, implicit, and diversity semantic relations between entities. The temporal semantic annotations can help users to learn and understand the unfamiliar or new emerged semantic relations between entities. The proposed temporal semantic annotation structure integrates the features from IEEE and Renlifang. We propose a general method to generate temporal semantic annotation of a semantic relation between entities by constructing its connection entities, lexical syntactic patterns, context sentences, context graph, and context communities. Empirical experiments on two different datasets including a LinkedIn dataset and movie star dataset show that the proposed method is effective and accurate. Different from the manually generated annotation repository such as Wikipedia and LinkedIn, the proposed method can automatically mine the semantic relation between entities and does not need any prior knowledge such as ontology or the hierarchical knowledge base. The proposed method can be used on some applications, which proves the effectiveness of the proposed temporal semantic relations on many web mining tasks.
    No preview · Article · Jul 2014 · Future Generation Computer Systems
  • Xinzhi Wang · Xiangfeng Luo · Huiming Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: Web events, whose data occur as one kind of big data, have attracted considerable interests during the past years. However, most existing related works fail to measure the veracity of web events. In this research, we propose an approach to measure the veracity of web event via its uncertainty. Firstly, the proposed approach mines several event features from the data of web event which may influence on the measuring process of uncertainty. Secondly, one computational model is introduced to simulate the influence process of the above features on the evolution process of web event. Thirdly, matrix operations are done to confirm that the result of the proposed iterative algorithm is in coincidence with the computational model. Finally, experiments are made based on the analysis above, and the results proved that the proposed uncertainty measuring algorithm is efficient and has high accuracy to measure the veracity of web event from the big data.
    No preview · Article · Jul 2014 · Journal of Systems and Software
  • Jun Zhang · Qing Li · Xiangfeng Luo · Xiao Wei
    [Show abstract] [Hide abstract]
    ABSTRACT: One of the most fundamental works for providing better Web services is the discovery of inter-word relations. However, the state of the art is either to acquire specific relations (e.g., causality) by involving much human efforts, or incapable of specifying relations in detail when no human effort is needed. In this paper, we propose a novel mechanism based on linguistics and cognitive psychology to automatically learn and specify association relations between words. The proposed mechanism, termed as ALSAR, includes two major processes: the first is to learn association relations from the perspective of verb valency grammar in linguistics, and the second is to further lable/specify the association relations with the help of related verbs. The resultant mechanism (i.e., ALSAR) is able to provide semantic descriptors which make inter-word relations more explicit without involving any human labeling. Furthermore, ALSAR incurs a very low complexity, and experimental evaluations on Chinese news articles crawled from Baidu News demonstrate good performance of ALSAR.
    No preview · Chapter · Jun 2014
  • Zheng Xu · Xiao Wei · Xiangfeng Luo · Yunhuai Liu · Lin Mei · Chuanping Hu · Lan Chen
    [Show abstract] [Hide abstract]
    ABSTRACT: An explosive growth in the volume, velocity, and variety of the data available on the Internet is witnessed recently. The data originated from multiple types of sources including mobile devices, sensors, individual archives, social networks, Internet of Things, enterprises, cameras, software logs, health data has led to one of the most challenging research issues of the big data era. In this paper, Knowle—an online news management system upon semantic link network model is introduced. Knowle is a news event centrality data management system. The core elements of Knowle are news events on the Web, which are linked by their semantic relations. Knowle is a hierarchical data system, which has three different layers including the bottom layer (concepts), the middle layer (resources), and the top layer (events). The basic blocks of Knowle system—news collection, resources representation, semantic relations mining, semantic linking news events are given. Knowle does not require data providers to follow semantic standards such as RDF or OWL, which is a semantics-rich self-organized network. It reflects various semantic relations of concepts, news, and events. Moreover, in the case study, Knowle is used for organizing and mining health news, which shows the potential on forming the basis of designing and developing big data analytics based innovation framework in health domain.
    No preview · Article · Apr 2014 · Future Generation Computer Systems