Xiangfeng Luo

Shanghai University, Shanghai, Shanghai Shi, China

Are you Xiangfeng Luo?

Claim your profile

Publications (64)18.48 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Association Link Network (ALN) is a kind of Semantic Link Network built by mining the association relations among multimedia Web resources for effectively supporting Web intelligent application such as Web-based learning, and semantic search. This paper explores the Small-World properties of ALN to provide theoretical support for association learning (i.e., a simple idea of “learning from Web resources”). First, a filtering algorithm of ALN is proposed to generate the filtered status of ALN, aiming to observe the Small-World properties of ALN at given network size and filtering parameter. Comparison of the Small-World properties between ALN and random graph shows that ALN reveals prominent Small-World characteristic. Then, we investigate the evolution of Small-World properties over time at several incremental network sizes. The average path length of ALN scales with the network size, while clustering coefficient of ALN is independent of the network size. And we find that ALN has smaller average path length and higher clustering coefficient than WWW at the same network size and network average degree. After that, based on the Small-World characteristic of ALN, we present an Association Learning Model (ALM), which can efficiently provide association learning of Web resources in breadth or depth for learners.
    World Wide Web 03/2014; · 1.20 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Relatedness measurement between multimedia such as images and videos plays an important role in computer vision, which is a base for many multimedia related applications including clustering, searching, recommendation, and annotation. Recently, with the explosion of social media, users can upload media data and annotate content with descriptive tags. In this paper, we aim at measuring the semantic relatedness of Flickr images. Firstly, four information theory based functions are used to measure the semantic relatedness of tags. Secondly, the integration of tags pair based on bipartite graph is proposed to remove the noise and redundancy. Thirdly, the order information of tags is added to measure the semantic relatedness, which emphasizes the tags with high positions. The data sets including 1000 images from Flickr are used to evaluate the proposed method. Two data mining tasks including clustering and searching are performed by the proposed method, which shows the effectiveness and robustness of the proposed method. Moreover, some applications such as searching and faceted exploration are introduced using the proposed method, which shows that the proposed method has broad prospects on web based tasks.
    The Scientific World Journal 01/2014; 2014:758089. · 1.73 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we study the problem of mining temporal semantic relations between entities. The goal of the studied problem is to mine and annotate a semantic relation with temporal, concise, and structured information, which can release the explicit, implicit, and diversity semantic relations between entities. The temporal semantic annotations can help users to learn and understand the unfamiliar or new emerged semantic relations between entities. The proposed temporal semantic annotation structure integrates the features from IEEE and Renlifang. We propose a general method to generate temporal semantic annotation of a semantic relation between entities by constructing its connection entities, lexical syntactic patterns, context sentences, context graph, and context communities. Empirical experiments on two different datasets including a LinkedIn dataset and movie star dataset show that the proposed method is effective and accurate. Different from the manually generated annotation repository such as Wikipedia and LinkedIn, the proposed method can automatically mine the semantic relation between entities and does not need any prior knowledge such as ontology or the hierarchical knowledge base. The proposed method can be used on some applications, which proves the effectiveness of the proposed temporal semantic relations on many web mining tasks.
    Future Generation Computer Systems 01/2014; 37:468–477. · 2.64 Impact Factor
  • Zheng Xu, Xiangfeng Luo, Lin Mei, Chuanping Hu
    [Show abstract] [Hide abstract]
    ABSTRACT: Association relations between concepts are a class of simple but powerful regularities in binary data, which play important roles in enterprises and organizations with huge amounts of data. However, although there can be easily large number of association relation mined from databases, since existing objective and subjective methods scarcely take semantics into consideration, it has been recognized early in the knowledge discovery literature that most of them are of no interest to the user. In this paper, the semantic discrimination capability (SDC) of association relation is measured based on discrimination value model first. The formula of SDC integrating both statistical and graph features is proposed from five different strategies. The high correlation coefficient of the proposed method against discrimination value shows that the proposed SDC measure is accuracy. Moreover, an application using SDC on document clustering is carried out, which shows that SDC has broad prospects on data‐related task such as document clustering. Copyright 2013 John Wiley © Sons, Ltd.
    Concurrency and Computation Practice and Experience 01/2014; 26(2). · 0.85 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: An explosive growth in the volume, velocity, and variety of the data available on the Internet is witnessed recently. The data originated from multiple types of sources including mobile devices, sensors, individual archives, social networks, Internet of Things, enterprises, cameras, software logs, health data has led to one of the most challenging research issues of the big data era. In this paper, Knowle—an online news management system upon semantic link network model is introduced. Knowle is a news event centrality data management system. The core elements of Knowle are news events on the Web, which are linked by their semantic relations. Knowle is a hierarchical data system, which has three different layers including the bottom layer (concepts), the middle layer (resources), and the top layer (events). The basic blocks of Knowle system—news collection, resources representation, semantic relations mining, semantic linking news events are given. Knowle does not require data providers to follow semantic standards such as RDF or OWL, which is a semantics-rich self-organized network. It reflects various semantic relations of concepts, news, and events. Moreover, in the case study, Knowle is used for organizing and mining health news, which shows the potential on forming the basis of designing and developing big data analytics based innovation framework in health domain.
    Future Generation Computer Systems 01/2014; · 2.64 Impact Factor
  • Xinzhi Wang, Xiangfeng Luo, Huiming Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: Web events, whose data occur as one kind of big data, have attracted considerable interests during the past years. However, most existing related works fail to measure the veracity of web events. In this research, we propose an approach to measure the veracity of web event via its uncertainty. Firstly, the proposed approach mines several event features from the data of web event which may influence on the measuring process of uncertainty. Secondly, one computational model is introduced to simulate the influence process of the above features on the evolution process of web event. Thirdly, matrix operations are done to confirm that the result of the proposed iterative algorithm is in coincidence with the computational model. Finally, experiments are made based on the analysis above, and the results proved that the proposed uncertainty measuring algorithm is efficient and has high accuracy to measure the veracity of web event from the big data.
    Journal of Systems and Software 01/2014; · 1.14 Impact Factor
  • Zheng Xu, Xiangfeng Luo, Xiao Wei, Lin Mei
    [Show abstract] [Hide abstract]
    ABSTRACT: Online popular events, which are constructed from news stories using the techniques of Topic Detection and Tracking (TDT), bring convenience to users who intend to see what is going on through the Internet. Recently, the web is becoming an important event information provider and poster due to its real-time, open, and dynamic features. However, it is difficult to detect events since the huge scale and dynamics of the internet. In this paper, we define the novel problem of investigating impact factors for event detection. We give the definitions of five impact factors including the number of increased web pages, the number of increased keywords, the number of communities, the average clustering coefficient, and the average similarities of web pages. These five impact factors contain statistic and content information of an event. Empirical experiments on real datasets including Google Zeitgeist and Google Trends show that that the number of web pages and the average clustering coefficient can be used to detect events. Some strategies integrating the number of web pages and the average clustering coefficient are also employed. The evaluations on real dataset show that the proposed function integrating the number of web pages and the average clustering coefficient can be used for event detection efficiently and correctly.
    Proceedings of the 2013 IEEE 16th International Conference on Computational Science and Engineering; 12/2013
  • Xinzhi Wang, Xiangfeng Luo, Jinjun Chen
    [Show abstract] [Hide abstract]
    ABSTRACT: Sentimental analyses of the public have been attracting increasing attentions from researchers. This paper focuses on the research problem of social sentiment detection, which aims to identify the sentiments of the public evoked by online microblogs. A general social sentiment model is proposed for this task. The general social sentiment model combining society and phycology knowledge are employed to measure social sentiment state. Then, we detail computation of sentiment vector to extract sentiment distribution of blogger on event. Besides, social state for events are computed based on the general social sentiment model and sentiment vectors. Furthermore, we certify that social sentiment are not independent but are correlated with each other heterogeneously in different events. The dependencies between sentiments can provide guidance in decision-making for government or organization. At last experiments on two real-world collections of events microblogs are conducted to prove the performance of our method.
    Proceedings of the 2013 IEEE 16th International Conference on Computational Science and Engineering; 12/2013
  • Junyu Xuan, Xiangfeng Luo, Jie Lu
    [Show abstract] [Hide abstract]
    ABSTRACT: On the web, there are numerous websites publishing web pages to cover the events occurring in society. The web events data satisfies the well-accepted attributes of big data: Volume, Velocity, Variety and Value. As a great value of web events data, website preferences can help the followers of web events, e.g. peoples or organizations, to select the proper websites to follow their interested aspects of web events. However, the big volume, fast evolution speed, multisource and unstructured data all together make the value of website preferences mining very challenging. In this paper, website preference is formally defined at first. Then, according to the hierarchical attribute of web events data, we propose a hierarchical network model to organize big data of a web event from different organizations, different areas and different nations at a given time stamp. With this hierarchical network structure in hand, two strategies are proposed to mine the value of websites preferences from web events data. The first straightforward strategy utilizes the communities of keyword level network and the mapping relations between websites and keywords to unveil the Value in them. By taking the whole hierarchical network structure into consideration, an iterative algorithm is proposed in second strategy to refine the keyword communities like the first strategy. At last, an evaluation criteria of website preferences is designed to compare the performances of two proposed strategies. Experimental results show the proper combination of horizontal relations (each level network) with vertical relations (mapping relations between three level networks) can extract more value from web events data and then improve the efficiency on website preferences mining.
    Proceedings of the 2013 IEEE 16th International Conference on Computational Science and Engineering; 12/2013
  • Feiyue Ye, Feng Zhang, Xiangfeng Luo, Lingyu Xu
    [Show abstract] [Hide abstract]
    ABSTRACT: As a free online encyclopedia with a large-scale of knowledge coverage, rich semantic information and quick update speed, Wikipedia brings new ideas to measure semantic correlation. In this paper, we present a new method for measuring the semantic correlation between words by mining rich semantic information that exists in Wikipedia. Unlike the previous methods that calculate semantic relatedness merely based on the page network or the category network, our method not only takes into account the semantic information of the page network, also combines the semantic information of the category network, and it improve the accuracy of the results. Besides, we analyze and evaluate the algorithm by comparing the calculation results with famous knowledge base (e.g., Hownet) and traditional methods based on Wikipedia on the same test set, and prove its superiority.
    Computer and Information Science (ICIS), 2013 IEEE/ACIS 12th International Conference on; 01/2013
  • Jun Zhang, Qing Li, Xiangfeng Luo, Xiao Wei
    [Show abstract] [Hide abstract]
    ABSTRACT: Text representation is one of the most fundamental works in text comprehension, processing, and search. Various works have been proposed to mine the semantics in texts and then to represent them. However, most of them only focus on how to mine semantics from the text itself while the background knowledge, which is very important to text understanding, is not taken into consideration. In this paper, on the basis of human cognitive process, we propose a multi-level text representation model within background knowledge, called TRMBK. It is composed of three levels, which are machine surface code (MSC), machine text base (MTB) and machine situational model (MSM). All of the three are able to be automatically constructed to acquire semantics both inside and outside of the text. Simultaneously, we also propose a method to automatically establish background knowledge and offer supports for the current text comprehension. Finally, experiments and comparisons have been presented to show the better performance of TRMBK.
    Cognitive Informatics & Cognitive Computing (ICCI*CC), 2013 12th IEEE International Conference on; 01/2013
  • Feiyue Ye, Hongxin Cao, Xiangfeng Luo
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper introduces the concept algebra (CA) theory as a basis for the conceptual representation and the derivation of text processing to realize a semantic based retrieval system. We also take advantage of Hownet to create the concept attributes space for concept algebra. With the help of LTP, we get the key words and their dependent relations of every sentence to build the CA concept representation of the content with a five-tuple. Concepts make it possible to express both the keyword itself and the semantic relation with its context. According to the demands of text retrieval, some CA operations are optimized to calculate the relations and similarity between concepts. Besides, a text retrieval system framework which processes information based on the concept relations at a concept level is also proposed to verify the advantages of our method.
    Computer and Information Technology (CIT), 2012 IEEE 12th International Conference on; 01/2012
  • Xiao Wei, Xiangfeng Luo, Qing Li
    [Show abstract] [Hide abstract]
    ABSTRACT: Faceted search on web pages needs exact facets. However, it is difficult to extract facets exactly from web pages because the web pages are unstructured and lack of facet information. Therefore, facet extraction is a key to faceted search. This paper proposed a method of extracting facets automatically from unstructured web pages to improve the faceted search on web. The Multidimensional Semantic Index (MDSI) of web pages is constructed by mining all kinds of semantic relations among the words from web pages, which creates a semantic-rich index for web pages. In MDSI, the differently dimensional semantic indexes are bridged by mining the semantic mapping between them. Based on the MDSI of web pages, the facets are extracted by analyzing semantic mapping relations in MDSI. To validate the effect of the proposed method, two datasets are constructed and the experimental results show that the proposed method is feasible and comparatively precise.
    Semantics, Knowledge and Grids (SKG), 2012 Eighth International Conference on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Using thirteen satellites sea surface temperature (SST) from infrared radiometers (AVHRR, MODIS) and microwave radiometers (TMI, AMSR), a set of high-quality, cloud-free, high-resolution(0.25°) merged SST product was generated through a method of extended revisal point. The area of merged SST product and extended revisal point product [15.875-40.125°N, 105.875-130.125°E] cover the whole China Sea. The experimental results show that the number of extension correct point will be hundreds of thousands times as many as in-situ SST data. It only has a small deviation by comparison with in-situ SST data, and the precision between the two is similar. Therefore, extension correct point can replace in-situ data under certain conditions. This study will offer a wider SST reference corrections data for quality evaluation, multi-source remote sensing, and assimilation technology.
    Cloud and Green Computing (CGC), 2012 Second International Conference on; 01/2012
  • Feiyue Ye, Haibo Tang, Xiangfeng Luo
    [Show abstract] [Hide abstract]
    ABSTRACT: Repeated pattern is a common phenomenon in query result pages of deep web sites. The deep web back-end data can be accessed by mining repeated patterns. So far, most of the algorithms of discovering repeated pattern use traditional web information extraction methods. But the recall percentage and accuracy are not high. How to obtain the repeated pattern accurately and completely is still a difficulty. We propose a method based on the largest block strategy to discover such pattern. The core of the method is using the largest block strategy to discover the repeated pattern layer. We can quickly navigate to the region of the entity data, and then analyze the sub tree in this area, finally, get the simplified repeated pattern of the deep web site. According to the results of the experiment, this method can get the repeated pattern data more accurately and more completely than the traditional methods. It can also address the multi-pattern problem which has not been solved yet in other methods.
    Computer and Information Technology (CIT), 2012 IEEE 12th International Conference on; 01/2012
  • Xiangfeng Luo, Zheng Xu, Jie Yu, Xue Chen
    [Show abstract] [Hide abstract]
    ABSTRACT: Association Link Network (ALN) aims to establish associated relations among various resources. By extending the hyperlink network World Wide Web to an association-rich network, ALN is able to effectively support Web intelligence activities such as Web browsing, Web knowledge discovery, and publishing, etc. Since existing methods for building semantic link on Web resources cannot effectively and automatically organize loose Web resources, effective Web intelligence activities are still challenging. In this paper, a discovery algorithm of associated resources is first proposed to build original ALN for organizing loose Web resources. Second, three schemas for constructing kernel ALN and connection-rich ALN (C-ALN) are developed gradually to optimize the organizing of Web resources. After that, properties of different types of ALN are discussed, which show that C-ALN has good performances to support Web intelligence activities. Moreover, an evaluation method is presented to verify the correctness of C-ALN for semantic link on documents. Finally, an application using C-ALN to organize Web services is presented, which shows that C-ALN is an effective and efficient tool for building semantic link on the resources of Web services.
    IEEE Transactions on Automation Science and Engineering 08/2011; · 1.67 Impact Factor
  • Fangfang Liu, Yan Chi, Jie Yu, Xiangfeng Luo, Zheng Xu
    Int. J. Web Service Res. 01/2011; 8:29-46.
  • Zheng Xu, Xiangfeng Luo, Jie Yu, Weimin Xu
    [Show abstract] [Hide abstract]
    ABSTRACT: Semantic similarity measures play important roles in many Web-related tasks such as Web browsing and query suggestion. Because taxonomy-based methods can not deal with continually emerging words, recently Web-based methods have been proposed to solve this problem. Because of the noise and redundancy hidden in the Web data, robustness and accuracy are still challenges. In this paper, we propose a method integrating page counts and snippets returned by Web search engines. Then, the semantic snippets and the number of search results are used to remove noise and redundancy in the Web snippets (‘Web-snippet’ includes the title, summary, and URL of a Web page returned by a search engine). After that, a method integrating page counts, semantics snippets, and the number of already displayed search results are proposed. The proposed method does not need any human annotated knowledge (e.g., ontologies), and can be applied Web-related tasks (e.g., query suggestion) easily. A correlation coefficient of 0.851 against Rubenstein–Goodenough benchmark dataset shows that the proposed method outperforms the existing Web-based methods by a wide margin. Moreover, the proposed semantic similarity measure significantly improves the quality of query suggestion against some page counts based methods. Copyright © 2011 John Wiley & Sons, Ltd.
    Concurrency and Computation: Practice and Experience. 01/2011; 23:2496-2510.
  • Zheng Xu, Xiangfeng Luo, Jie Yu, Weimin Xu
    [Show abstract] [Hide abstract]
    ABSTRACT: Queries to Web search engines are usually short and ambiguous, which provides insufficient information needs of users for effectively retrieving relevant Web pages. To address this problem, query suggestion is implemented by most search engines. However, existing methods do not leverage the contradiction between accuracy and computation complexity appropriately (e.g. Google's ‘Search related to’ and Yahoo's ‘Also Try’). In this paper, the recommended words are extracted from the search results of the query, which guarantees the real time of query suggestion properly. A scheme for ranking words based on semantic similarity presents a list of words as the query suggestion results, which ensures the accuracy of query suggestion. Moreover, the experimental results show that the proposed method significantly improves the quality of query suggestion over some popular Web search engines (e.g. Google and Yahoo). Finally, an offline experiment that compares the accuracy of snippets in capturing the number of words in a document is performed, which increases the confidence of the method proposed by the paper. Copyright © 2010 John Wiley & Sons, Ltd.
    Concurrency and Computation Practice and Experience 01/2011; 23:1101-1113. · 0.85 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Web search is a key information retrieval method for human beings in current society, both in the fields of academic and commercial activities. Due to "one-size-fits-all" approach limit to search results obtainment, there are still challenges to give personalized web service with high precision in traditional web search process. Herein, a new framework is proposed to advance traditional search process into a new paradigm, i.e. interactive search. First, a cognitive model of interaction process about one search step is developed by the imitation of human search behaviors. Then based on the theory of interactive computing, an interactive search model is introduced to formalize successive search sessions which include several search steps. Third, based on human's cognitive mechanism such as the spreading activation model and user memory theory, a user model is designed to capture user's search activities which can aid the interactive search process effectively. Last, by the help of Associated Link Network, an information gradient based rank algorithm is proposed aiming at maximizing the quantity of web information supply. The efficiency of our proposed interactive search service is verified by the experimental results.
    IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing, DASC 2011, 12-14 December 2011, Sydney, Australia; 01/2011