Xiangfeng Luo

Sichuan Fire Research Institute of the Ministry of Public Security, Hua-yang, Sichuan, China

Are you Xiangfeng Luo?

Claim your profile

Publications (85)27.68 Total impact

  • Source
    Junyu Xuan, Jie Lu, Xiangfeng Luo, Guangquan Zhang
    [Show abstract] [Hide abstract]
    ABSTRACT: Nonnegative Matrix Factorization (NMF) aims to factorize a matrix into two optimized nonnegative matrices and has been widely used for unsupervised learning tasks such as product recommendation based on a rating matrix. However, although networks between nodes with the same nature exist, standard NMF overlooks them, e.g., the social network between users. This problem leads to comparatively low recommendation accuracy because these networks are also reflections of the nature of the nodes, such as the preferences of users in a social network. Also, social networks, as complex networks, have many different structures. Each structure is a composition of links between nodes and reflects the nature of nodes, so retaining the different network structures will lead to differences in recommendation performance. To investigate the impact of these network structures on the factorization, this paper proposes four multi-level network factorization algorithms based on the standard NMF, which integrates the vertical network (e.g., rating matrix) with the structures of horizontal network (e.g., user social network). These algorithms are carefully designed with corresponding convergence proofs to retain four desired network structures. Experiments on synthetic data show that the proposed algorithms are able to preserve the desired network structures as designed. Experiments on real-world data show that considering the horizontal networks improves the accuracy of document clustering and recommendation with standard NMF, and various structures show their differences in performance on these two tasks. These results can be directly used in document clustering and recommendation systems.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Traditional Relational Topic Models provide a way to discover the hidden topics from a document network. Many theoretical and practical tasks, such as dimensional reduction, document clustering, link prediction, benefit from this revealed knowledge. However, existing relational topic models are based on an assumption that the number of hidden topics is known in advance, and this is impractical in many real-world applications. Therefore, in order to relax this assumption, we propose a nonparametric relational topic model in this paper. Instead of using fixed-dimensional probability distributions in its generative model, we use stochastic processes. Specifically, a gamma process is assigned to each document, which represents the topic interest of this document. Although this method provides an elegant solution, it brings additional challenges when mathematically modeling the inherent network structure of typical document network, i.e., two spatially closer documents tend to have more similar topics. Furthermore, we require that the topics are shared by all the documents. In order to resolve these challenges, we use a subsampling strategy to assign each document a different gamma process from the global gamma process, and the subsampling probabilities of documents are assigned with a Markov Random Field constraint that inherits the document network structure. Through the designed posterior inference algorithm, we can discover the hidden topics and its number simultaneously. Experimental results on both synthetic and real-world network datasets demonstrate the capabilities of learning the hidden topics and, more importantly, the number of topics.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Incorporating the side information of text corpus, i.e., authors, time stamps, and emotional tags, into the traditional text mining models has gained significant interests in the area of information retrieval, statistical natural language processing, and machine learning. One branch of these works is the so-called Author Topic Model (ATM), which incorporates the authors's interests as side information into the classical topic model. However, the existing ATM needs to predefine the number of topics, which is difficult and inappropriate in many real-world settings. In this paper, we propose an Infinite Author Topic (IAT) model to resolve this issue. Instead of assigning a discrete probability on fixed number of topics, we use a stochastic process to determine the number of topics from the data itself. To be specific, we extend a gamma-negative binomial process to three levels in order to capture the author-document-keyword hierarchical structure. Furthermore, each document is assigned a mixed gamma process that accounts for the multi-author's contribution towards this document. An efficient Gibbs sampling inference algorithm with each conditional distribution being closed-form is developed for the IAT model. Experiments on several real-world datasets show the capabilities of our IAT model to learn the hidden topics, authors' interests on these topics and the number of topics simultaneously.
  • Xiao Wei, Xiangfeng Luo, Qing Li, Jun Zhang, Zheng Xu
    [Show abstract] [Hide abstract]
    ABSTRACT: Online comment has become a popular and efficient way for sellers to acquire feedback from customers and improve their service quality. However, some key issues need to be solved about evaluating and improving the hotel service quality based on online comments automatically, such as how to use the less trustworthy online comments, how to discover the quality defects from online comments, and how to recommend more feasible or economical evaluation indexes to improve the service quality based on online comments. To solve the above problems, this paper first improves fuzzy comprehensive evaluation (FCE) by importing trustworthy degree to it and proposes an automatic hotel service quality assessment method using the improved FCE, which can automatically get more trustworthy evaluation from a large amount of less trustworthy online comments. Then, the causal relations among evaluation indexes are mined from online comments to build the fuzzy cognitive map for the hotel service quality, which is useful to unfold the problematic areas of hotel service quality, and recommend more economical solutions to improving the service quality. Finally, both case studies and experiments are conducted to demonstrate that the proposed methods are effective in evaluating and improving the hotel service quality using online comments.
    IEEE Transactions on Fuzzy Systems 02/2015; 23(1):72-84. DOI:10.1109/TFUZZ.2015.2390226 · 6.31 Impact Factor
  • Junyu Xuan, Jie Lu, Guangquan Zhang, Xiangfeng Luo
    [Show abstract] [Hide abstract]
    ABSTRACT: Graph mining has been a popular research area because of its numerous application scenarios. Many unstructured and structured data can be represented as graphs, such as, documents, chemical molecular structures, and images. However, an issue in relation to current research on graphs is that they cannot adequately discover the topics hidden in graph-structured data which can be beneficial for both the unsupervised learning and supervised learning of the graphs. Although topic models have proved to be very successful in discovering latent topics, the standard topic models cannot be directly applied to graph-structured data due to the "bag-of-word" assumption. In this paper, an innovative graph topic model (GTM) is proposed to address this issue, which uses Bernoulli distributions to model the edges between nodes in a graph. It can, therefore, make the edges in a graph contribute to latent topic discovery and further improve the accuracy of the supervised and unsupervised learning of graphs. The experimental results on two different types of graph datasets show that the proposed GTM outperforms the latent Dirichlet allocation on classification by using the unveiled topics of these two models to represent graphs.
  • Xiangfeng Luo, Jun Zhang, Qing Li, Xiao Wei, Lei Lu
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper advocates for a novel approach to recommend texts at various levels of difficulties based on a proposed method, the algebraic complexity of texts (ACT). Different from traditional complexity measures that mainly focus on surface features like the numbers of syllables per word, characters per word, or words per sentence, ACT draws from the perspective of human concept learning, which can reflect the complex semantic relations inside texts. To cope with the high cost of measuring ACT, the Degree-2 Hypothesis of ACT is proposed to reduce the measurement from unrestricted dimensions to three dimensions. Based on the principle of “mental anchor,” an extension of ACT and its general edition [denoted as extension of text algebraic complexity (EACT) and general extension of text algebraic complexity (GEACT)] are developed, which take keywords’ and association rules’ complexities into account. Finally, using the scores given by humans as a benchmark, we compare our proposed methods with linguistic models. The experimental results show the order GEACT>EACT>ACT> Linguistic models, which means GEACT performs the best, while linguistic models perform the worst. Additionally, GEACT with lower convex functions has the best ability in measuring the algebraic complexities of text understanding. It may also indicate that the human complexity curve tends to be a curve like lower convex function rather than linear functions.
    10/2014; 44(5):638-649. DOI:10.1109/THMS.2014.2329874
  • Yang Liu, Xiangfeng Luo, Junyu Xuan
    [Show abstract] [Hide abstract]
    ABSTRACT: Online hot event discovery has become a flourishing frontier where online document streams are monitored to discover newly occurring events or assigned to previously detected events. However, hot events have the nature to evolve, and their inherent topically related words are also likely to evolve. It makes event discovery a challenging task for traditional-mining approaches. Combining word association and semantic community, Association Link Network (ALN) organizes the loosely distributed associated resources. This paper presents an ALN-based novel online hot event discovery approach. Technically, this approach is enacted around three stages. In the first stage, we extract significant features to represent the content of each document from the online document stream. During the second stage, we classify the online document stream into topically related detected events considering event evolution in the form of ALN. At the third stage, we create an ALN-based event detection algorithm, which is used to timely discover newly occurring hot events. The online datasets used in our empirical studies are acquired from Baidu News, which spans a range of 1315 hot events and 236,300 documents. Experimental results demonstrate the hot events discovery ability with respect to high accuracy, good scalability, and short runtime. Copyright © 2014 John Wiley & Sons, Ltd.
    Concurrency and Computation Practice and Experience 09/2014; DOI:10.1002/cpe.3374 · 0.78 Impact Factor
  • Xinzhi Wang, Xiangfeng Luo, Huiming Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: Web events, whose data occur as one kind of big data, have attracted considerable interests during the past years. However, most existing related works fail to measure the veracity of web events. In this research, we propose an approach to measure the veracity of web event via its uncertainty. Firstly, the proposed approach mines several event features from the data of web event which may influence on the measuring process of uncertainty. Secondly, one computational model is introduced to simulate the influence process of the above features on the evolution process of web event. Thirdly, matrix operations are done to confirm that the result of the proposed iterative algorithm is in coincidence with the computational model. Finally, experiments are made based on the analysis above, and the results proved that the proposed uncertainty measuring algorithm is efficient and has high accuracy to measure the veracity of web event from the big data.
    Journal of Systems and Software 07/2014; 102. DOI:10.1016/j.jss.2014.07.023 · 1.25 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we study the problem of mining temporal semantic relations between entities. The goal of the studied problem is to mine and annotate a semantic relation with temporal, concise, and structured information, which can release the explicit, implicit, and diversity semantic relations between entities. The temporal semantic annotations can help users to learn and understand the unfamiliar or new emerged semantic relations between entities. The proposed temporal semantic annotation structure integrates the features from IEEE and Renlifang. We propose a general method to generate temporal semantic annotation of a semantic relation between entities by constructing its connection entities, lexical syntactic patterns, context sentences, context graph, and context communities. Empirical experiments on two different datasets including a LinkedIn dataset and movie star dataset show that the proposed method is effective and accurate. Different from the manually generated annotation repository such as Wikipedia and LinkedIn, the proposed method can automatically mine the semantic relation between entities and does not need any prior knowledge such as ontology or the hierarchical knowledge base. The proposed method can be used on some applications, which proves the effectiveness of the proposed temporal semantic relations on many web mining tasks.
    Future Generation Computer Systems 07/2014; 37:468–477. DOI:10.1016/j.future.2013.09.027 · 2.64 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: An explosive growth in the volume, velocity, and variety of the data available on the Internet is witnessed recently. The data originated from multiple types of sources including mobile devices, sensors, individual archives, social networks, Internet of Things, enterprises, cameras, software logs, health data has led to one of the most challenging research issues of the big data era. In this paper, Knowle—an online news management system upon semantic link network model is introduced. Knowle is a news event centrality data management system. The core elements of Knowle are news events on the Web, which are linked by their semantic relations. Knowle is a hierarchical data system, which has three different layers including the bottom layer (concepts), the middle layer (resources), and the top layer (events). The basic blocks of Knowle system—news collection, resources representation, semantic relations mining, semantic linking news events are given. Knowle does not require data providers to follow semantic standards such as RDF or OWL, which is a semantics-rich self-organized network. It reflects various semantic relations of concepts, news, and events. Moreover, in the case study, Knowle is used for organizing and mining health news, which shows the potential on forming the basis of designing and developing big data analytics based innovation framework in health domain.
    Future Generation Computer Systems 04/2014; DOI:10.1016/j.future.2014.04.002 · 2.64 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Association Link Network (ALN) is a kind of Semantic Link Network built by mining the association relations among multimedia Web resources for effectively supporting Web intelligent application such as Web-based learning, and semantic search. This paper explores the Small-World properties of ALN to provide theoretical support for association learning (i.e., a simple idea of “learning from Web resources”). First, a filtering algorithm of ALN is proposed to generate the filtered status of ALN, aiming to observe the Small-World properties of ALN at given network size and filtering parameter. Comparison of the Small-World properties between ALN and random graph shows that ALN reveals prominent Small-World characteristic. Then, we investigate the evolution of Small-World properties over time at several incremental network sizes. The average path length of ALN scales with the network size, while clustering coefficient of ALN is independent of the network size. And we find that ALN has smaller average path length and higher clustering coefficient than WWW at the same network size and network average degree. After that, based on the Small-World characteristic of ALN, we present an Association Learning Model (ALM), which can efficiently provide association learning of Web resources in breadth or depth for learners.
    World Wide Web 03/2014; DOI:10.1007/s11280-012-0171-7 · 1.62 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Relatedness measurement between multimedia such as images and videos plays an important role in computer vision, which is a base for many multimedia related applications including clustering, searching, recommendation, and annotation. Recently, with the explosion of social media, users can upload media data and annotate content with descriptive tags. In this paper, we aim at measuring the semantic relatedness of Flickr images. Firstly, four information theory based functions are used to measure the semantic relatedness of tags. Secondly, the integration of tags pair based on bipartite graph is proposed to remove the noise and redundancy. Thirdly, the order information of tags is added to measure the semantic relatedness, which emphasizes the tags with high positions. The data sets including 1000 images from Flickr are used to evaluate the proposed method. Two data mining tasks including clustering and searching are performed by the proposed method, which shows the effectiveness and robustness of the proposed method. Moreover, some applications such as searching and faceted exploration are introduced using the proposed method, which shows that the proposed method has broad prospects on web based tasks.
    The Scientific World Journal 02/2014; 2014:758089. DOI:10.1155/2014/758089 · 1.73 Impact Factor
  • Zheng Xu, Xiangfeng Luo, Lin Mei, Chuanping Hu
    [Show abstract] [Hide abstract]
    ABSTRACT: Association relations between concepts are a class of simple but powerful regularities in binary data, which play important roles in enterprises and organizations with huge amounts of data. However, although there can be easily large number of association relation mined from databases, since existing objective and subjective methods scarcely take semantics into consideration, it has been recognized early in the knowledge discovery literature that most of them are of no interest to the user. In this paper, the semantic discrimination capability (SDC) of association relation is measured based on discrimination value model first. The formula of SDC integrating both statistical and graph features is proposed from five different strategies. The high correlation coefficient of the proposed method against discrimination value shows that the proposed SDC measure is accuracy. Moreover, an application using SDC on document clustering is carried out, which shows that SDC has broad prospects on data‐related task such as document clustering. Copyright 2013 John Wiley © Sons, Ltd.
    Concurrency and Computation Practice and Experience 02/2014; 26(2). DOI:10.1002/cpe.2999 · 0.78 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: How to build a text knowledge representation model, which carries rich knowledge and has a flexible reasoning ability as well as can be automatically constructed with a low computational complexity, is a fundamental challenge for reasoning-based knowledge services, especially with the rapid growth of web resources. However, current text knowledge representation models either lose much knowledge [e. g., vector space model (VSM)] or have a high complex computation [e. g., latent Dirichlet allocation (LDA)]; even some of them cannot be constructed automatically [e. g., web ontology language, (OWL)]. In this paper, a novel text knowledge representation model, power series representation (PSR) model, which has a low complex computation in text knowledge constructing process, is proposed to leverage the contradiction between carrying rich knowledge and automatic construction. First, concept algebra of human concept learning is developed to represent text knowledge as the form of power series. Then, degree-2 power series hypothesis is introduced to simplify the proposed PSR model, which can be automatically constructed with a lower complex computation and has more knowledge than the VSM and LDA. After that, degree-2 power series hypothesis-based reasoning operations are developed, which provide a more flexible reasoning ability than OWL and LDA. Furthermore, experiments and comparisons with current knowledge representation models show that our model has better characteristics than others when representing text knowledge. Finally, a demo is given to indicate that PSR model has a good prospect over the area of web semantic search.
    01/2014; 44(1):86-102. DOI:10.1109/TSMCC.2012.2231674
  • [Show abstract] [Hide abstract]
    ABSTRACT: Recent research shows that multimedia resources in the wild are growing at a staggering rate. The rapid increase number of multimedia resources has brought an urgent need to develop intelligent methods to organize and process them. In this paper, the semantic link network model is used for organizing multimedia resources. A whole model for generating the association relation between multimedia resources using semantic link network model is proposed. The definitions, modules, and mechanisms of the semantic link network are used in the proposed method. The integration between the semantic link network and multimedia resources provides a new prospect for organizing them with their semantics. The tags and the surrounding texts of multimedia resources are used to measure their semantic association. The hierarchical semantic of multimedia resources is defined by their annotated tags and surrounding texts. The semantics of tags and surrounding texts are different in the proposed framework. The modules of semantic link network model are implemented to measure association relations. A real data set including 100 thousand images with social tags from Flickr is used in our experiments. Two evaluation methods, including clustering and retrieval, are performed, which shows the proposed method can measure the semantic relatedness between Flickr images accurately and robustly.
    01/2014; 2(3):376-387. DOI:10.1109/TETC.2014.2316525
  • [Show abstract] [Hide abstract]
    ABSTRACT: As documents are explosively increasing in the era of big data, document clustering has been proven to be useful for organizing online document streams into events. However, extant studies on document clustering still suffer from the problems of high dimensionality, scalability and accuracy. In this paper, we will present a novel association link network (ALN) based document clustering method, which is an adaptive iteration splitting process to discover core events on the web. In the iteration, we first detect community structures from ALN, then, map documents to the associated community based on words relations in ALN, finally rebuild communities using the mapped documents. Compared to existing document clustering methods, the effectiveness of presented clustering method in automatically discovering the web events is proved by the experimental results on real data set.
    Proceedings of the 2013 IEEE 16th International Conference on Computational Science and Engineering; 12/2013
  • Junyu Xuan, Xiangfeng Luo, Jie Lu
    [Show abstract] [Hide abstract]
    ABSTRACT: On the web, there are numerous websites publishing web pages to cover the events occurring in society. The web events data satisfies the well-accepted attributes of big data: Volume, Velocity, Variety and Value. As a great value of web events data, website preferences can help the followers of web events, e.g. peoples or organizations, to select the proper websites to follow their interested aspects of web events. However, the big volume, fast evolution speed, multisource and unstructured data all together make the value of website preferences mining very challenging. In this paper, website preference is formally defined at first. Then, according to the hierarchical attribute of web events data, we propose a hierarchical network model to organize big data of a web event from different organizations, different areas and different nations at a given time stamp. With this hierarchical network structure in hand, two strategies are proposed to mine the value of websites preferences from web events data. The first straightforward strategy utilizes the communities of keyword level network and the mapping relations between websites and keywords to unveil the Value in them. By taking the whole hierarchical network structure into consideration, an iterative algorithm is proposed in second strategy to refine the keyword communities like the first strategy. At last, an evaluation criteria of website preferences is designed to compare the performances of two proposed strategies. Experimental results show the proper combination of horizontal relations (each level network) with vertical relations (mapping relations between three level networks) can extract more value from web events data and then improve the efficiency on website preferences mining.
    Proceedings of the 2013 IEEE 16th International Conference on Computational Science and Engineering; 12/2013
  • Zheng Xu, Xiangfeng Luo, Xiao Wei, Lin Mei
    [Show abstract] [Hide abstract]
    ABSTRACT: Online popular events, which are constructed from news stories using the techniques of Topic Detection and Tracking (TDT), bring convenience to users who intend to see what is going on through the Internet. Recently, the web is becoming an important event information provider and poster due to its real-time, open, and dynamic features. However, it is difficult to detect events since the huge scale and dynamics of the internet. In this paper, we define the novel problem of investigating impact factors for event detection. We give the definitions of five impact factors including the number of increased web pages, the number of increased keywords, the number of communities, the average clustering coefficient, and the average similarities of web pages. These five impact factors contain statistic and content information of an event. Empirical experiments on real datasets including Google Zeitgeist and Google Trends show that that the number of web pages and the average clustering coefficient can be used to detect events. Some strategies integrating the number of web pages and the average clustering coefficient are also employed. The evaluations on real dataset show that the proposed function integrating the number of web pages and the average clustering coefficient can be used for event detection efficiently and correctly.
    Proceedings of the 2013 IEEE 16th International Conference on Computational Science and Engineering; 12/2013
  • Xinzhi Wang, Xiangfeng Luo, Jinjun Chen
    [Show abstract] [Hide abstract]
    ABSTRACT: Sentimental analyses of the public have been attracting increasing attentions from researchers. This paper focuses on the research problem of social sentiment detection, which aims to identify the sentiments of the public evoked by online microblogs. A general social sentiment model is proposed for this task. The general social sentiment model combining society and phycology knowledge are employed to measure social sentiment state. Then, we detail computation of sentiment vector to extract sentiment distribution of blogger on event. Besides, social state for events are computed based on the general social sentiment model and sentiment vectors. Furthermore, we certify that social sentiment are not independent but are correlated with each other heterogeneously in different events. The dependencies between sentiments can provide guidance in decision-making for government or organization. At last experiments on two real-world collections of events microblogs are conducted to prove the performance of our method.
    Proceedings of the 2013 IEEE 16th International Conference on Computational Science and Engineering; 12/2013
  • Yang Liu, Xiangfeng Luo
    [Show abstract] [Hide abstract]
    ABSTRACT: With the coming era of Big Data, online hot event discovery has emerged to mine the social hot spots on the large-scale web resources. Hot events are naturally evolved over time, and in the meantime, their inherent semantic relations are likely to change. As a result, traditional event detection approaches do not perform well on the dynamic web resources. To overcome these bottlenecks, this paper presents a novel hot event discovery framework to detect hot events online, containing three stages: 1) document preprocessing which selects significant features to represent document content, 2) threshold-resilient document classification, which classifies the incoming documents into topically related events considering event evolution, 3) adaptive splitting document clustering, which is used to timely cluster newly happened hot events. Using online data set from Baidu website, the experiments demonstrate the hot events discovery ability with respect to high accuracy, good scalability and short runtime.
    Proceedings of the 2013 IEEE 16th International Conference on Computational Science and Engineering; 12/2013

Publication Stats

231 Citations
27.68 Total Impact Points

Institutions

  • 2014
    • Sichuan Fire Research Institute of the Ministry of Public Security
      Hua-yang, Sichuan, China
  • 2006–2014
    • Shanghai University
      • School of Computer Engineering and Sciences
      Shanghai, Shanghai Shi, China
  • 2012–2013
    • Shanghai University of Engineering Science
      Shanghai, Shanghai Shi, China