Qiankun Zhao

Telefónica I+D, Madrid, Madrid, Spain

Are you Qiankun Zhao?

Claim your profile

Publications (27)0 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Social networks mediate not only the relations between entities, but also the patterns of information propagation among them and their communication behavior. In this paper, we extensively study the temporal annotations (e.g., time stamps and duration) of historical communications in social networks and propose two novel tools -- communication motifs and maximum-flow communication motifs -- for characterizations of the patterns of information propagation in social networks. Using these motifs, we verify the following hypothesis in social communication network: 1) the functional behavioral patterns of information propagation within both social networks are stable over time; 2) the patterns of information propagation in synchronous and asynchronous social networks are different and sensitive to the cost of communication; and 3) the speed and the amount of information that is propagated through a network are correlated and dependent on individual profiles.
    Proceedings of the 19th ACM Conference on Information and Knowledge Management, CIKM 2010, Toronto, Ontario, Canada, October 26-30, 2010; 01/2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: Online social networking platforms have become a popular channel of communications among people. However, most people can only keep in touch with a limited number of friends. This phenomenon results in a low-connectivity social network in terms of communications, which is inefficient for information propagation and social engagement. In this paper, we introduce a new recommendation service, called link revival, that suggests users to re-connect with their old friends, such that the resulted connection will improve the social network connectivity. To achieve high connectivity improvement under the dynamic social network evolvement, we propose a graph prediction-based recommendation strategy, which selects proper candidates based on the prediction of their future behaviors. We then develop an effective model that exploits non-homogeneous Poisson process and second-order self-similarity in prediction. Through comprehensive experimental studies on two real datasets (Phone Call Network and Facebook Wall-posts), we demonstrate that our proposed approach can significantly increase the social network connectivity, and that the approach outperforms other baseline solutions. The results also show that our solution is more suitable for online social networks like Facebook, partially due to the stronger long range dependency and lower communication costs in the interactions.
    Proceedings of the 19th ACM Conference on Information and Knowledge Management, CIKM 2010, Toronto, Ontario, Canada, October 26-30, 2010; 01/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Mining difierent types of communities from web data have attracted a lot of research efiorts in recent years. However, none of the existing community mining techniques has taken into account both the dynamic as well as heterogeneous na- ture of web data. In this paper, we propose to character- ize and predict community members from the evolution of heterogeneous web data. We flrst propose a general frame- work for analyzing the evolution of heterogeneous networks. Then, the academic network, which is extracted from 1 mil- lion computer science papers, is used as an example to illus- trate the framework. Finally, two example applications of the academic network are presented. Experimental results with a real and very large heterogeneous academic network show that our proposed framework can produce good re- sults in terms of community member recommendation. Also, novel knowledge and insights can be gained by analyzing the community evolution pattern.
    Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM 2008, Napa Valley, California, USA, October 26-30, 2008; 01/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Tags are user-generated labels for entities. Existing research on tag recommendation either focuses on improving its accuracy or on automating the process, while ignoring the efficiency issue. We propose a highly-automated novel framework for real-time tag recommendation. The tagged training documents are treated as triplets of (words, docs, tags), and represented in two bipartite graphs, which are partitioned into clusters by Spectral Recursive Embedding (SRE). Tags in each topical cluster are ranked by our novel ranking algorithm. A two-way Poisson Mixture Model (PMM) is proposed to model the document distribution into mixture components within each cluster and aggregate words into word clusters simultaneously. A new document is classified by the mixture model based on its posterior probabilities so that tags are recommended according to their ranks. Experiments on large-scale tagging datasets of scientific documents (CiteULike) and web pages del.icio.us) indicate that our framework is capable of making tag recommendation efficiently and effectively. The average tagging time for testing a document is around 1 second, with over 88% test documents correctly labeled with the top nine tags we suggested.
    Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, Singapore, July 20-24, 2008; 01/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a novel approach to expand queries by exploring both location information and topic information of the queries. Users at different locations tend to have differ- ent vocabularies, while the different expressions coming from different vocabularies may relate to the same topics. Thus these expressions are identified as location sensitive and can be used for query expansion. We propose a hierarchical query expansion model, which employs a two-level SVM classifi- cation model to classify queries as location sensitive or lo- cation non-sensitive, where the former are further classified into same location sensitive and different location sensitive. For the location sensitive queries, we propose an LDA based topic-level query similarity measure to rank the list of simi- lar queries. Experiments with 2G raw log data from CiteSeer and Excite1 show that our hierarchical classification model predicts the query location sensitivity with more than 80% precision and that the final search result is significantly better than existing query expansion methods.
    Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, AAAI 2008, Chicago, Illinois, USA, July 13-17, 2008; 01/2008
  • [Show abstract] [Hide abstract]
    ABSTRACT: Users use a few keywords to post queries to search engines. Search engines, often, fail to return answers that their users seek because the keyword queries incompletely specify the information being sought and because of the ambiguity of natural language terms. Query expansion, where additional keywords are added automatically or semi-automatically to the user's query before it is run, has been used to improve the accuracy of search engines. We propose a framework where first, we identify whether a query should be expanded based on its features. We focus on identifying queries whose results are location-sensitive and expand them using keywords from similar queries from similar locations. Similarity between queries is derived using a novel LDA-based topic-level query similarity measure. We conducted experiments with query log data from the CiteSeer digital library and see a small improvement of results due to our query expansion.
    Data Mining Workshops, 2007. ICDM Workshops 2007. Seventh IEEE International Conference on; 11/2007
  • Source
    Bi Chen, Qiankun Zhao, Bingjun Sun, P. Mitra
    [Show abstract] [Hide abstract]
    ABSTRACT: Modeling the behavior of bloggers is an important problem with various applications in recommender systems, targeted advertising, and event detection. In this paper, we propose three models by combining content, temporal, social dimensions: the general blogging-behavior model, the profile-based blogging-behavior model and the social- network and profile-based blogging-behavior model. The models are based on two regression techniques: Extreme Learning Machine (ELM), and Modified General Regression Neural Network (MGRNN). We choose one of the largest blogs, a political blog, DailyKos<sup>1</sup>, for our empirical evaluation. Experiments show that the social network and profile-based blogging behavior model with ELM regression techniques produce good results for the most active bloggers and can be used to predict blogging behavior.
    Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on; 11/2007
  • Source
    Qiankun Zhao, P. Mitra, Dongwon Lee, Jaewoo Kang
    [Show abstract] [Hide abstract]
    ABSTRACT: A novel microarray value imputation method, HICCUP<sup>1</sup>, is presented. HICCUP improves upon existing value imputation methods in the several ways. (1) By judiciously integrating heterogeneous microarray datasets using hierarchical clustering, HICCUP overcomes the limitation of using only single dataset with limited number of samples; (2) Unlike local or global value imputation methods, by mining association rules, HICCUP selects appropriate subsets of the most relevant samples for better value imputation; and (3) by exploiting relationship among the sample space (e.g., cancer vs. non-cancer samples), HICCUP improves the accuracy of value imputation. Experiments with a real prostate cancer microarray dataset verify that HICCUP outperforms existing approaches.
    Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on; 11/2007
  • Source
    Qiankun Zhao, Prasenjit Mitra, C. Lee Giles
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a novel approach of image anno- tation by constructing a hierarchical mapping between low- level visual features and text features utilizing the relations within and across both visual features and text features. Moreover, we propose a novel annotation strategy that max- imizes both the accuracy and the diversity of the generated annotation by generalizing or specifying the annotation in the corresponding annotation hierarchy. Experiments with 4500 scientific images from Royal Society of Chemistry jour- nals show that the proposed annotation approach produces satisfactory results at different levels of annotations.
    Proceedings of the 16th International Conference on World Wide Web, WWW 2007, Banff, Alberta, Canada, May 8-12, 2007; 01/2007
  • Source
    Qiankun Zhao, Prasenjit Mitra, Bi Chen
    [Show abstract] [Hide abstract]
    ABSTRACT: Recently, social text streams (e.g., blogs, web forums, and emails) have become ubiquitous with the evolution of the web. In some sense, social text streams are sen- sors of the real world. Often, it is desirable to extract real world events from the social text streams. How- ever, existing event detection research mainly focused only on the stream properties of social text streams but ignored the contextual, temporal, and social information embedded in the streams. In this paper, we propose to detect events from social text streams by exploring the content as well as the temporal, and social dimensions. We define the term event as the information flow be- tween a group of social actors on a specific topic over a certain time period. We represent social text streams as multi-graphs, where each node represents a social ac- tor and each edge represents the information flow be- tween two actors. The content and temporal associa- tions within the flow of information are embedded in the corresponding edge. Events are detected by combining text-based clustering, temporal segmentation, and infor- mation flow-based graph cuts of the dual graph of the social networks. Experiments conducted with the En- ron email dataset1 and the political blog dataset from Dailykos2 show the proposed event detection approach outperforms the other alternatives.
    Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, July 22-26, 2007, Vancouver, British Columbia, Canada; 01/2007
  • Source
    Qiankun Zhao, Sourav S Bhowmick, Aixin Sun
    [Show abstract] [Hide abstract]
    ABSTRACT: The web is a sensor of the real world. Often, content of web pages correspond to real world objects or events whereas the web usage data reflect users’ opinions and actions to the corresponding events. Moreover, the evolution patterns of the web usage data may reflect the evolution of the corresponding events over time. In this paper, we present two variants of i Wed(Integrated Web Event Detector) algorithm to extract events from website data by integrating author-centric data and visitor-centric data. We model the website related data as a multigraph, where each vertex represents a web page and each edge represents the relationship between the connected web pages in terms of structure, semantic, and/or usage pattern. Then, the problem of event detection is to extract strongly connected subgraphs from the multigraph to represent real world events. We solve this problem by adopting the normalized graph cut algorithm. Experiments show that the usage patterns play an important role in i Wed algorithms and can produce high quality results.
    03/2006: pages 351-360;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Existing web usage mining techniques focus only on discovering knowledge based on the statistical measures obtained from the static characteristics of web usage data. They do not consider the dynamic nature of web usage data. In this paper, we present an algorithm called Cleopatra (CLustering of EvOlutionary PAtTeRn-based web Access sequences) to cluster web access sequences (WAS)s\mathcal{(WAS)}s based on their evolutionary patterns. In this approach, Web access sequences that have similar change patterns in their support counts in the history are grouped into the same cluster. The intuition is that often WASs\mathcal{WAS}s are event/task-driven. As a result, WASs\mathcal{WAS}s related to the same event/task are expected to be accessed in similar ways over time. Such clusters are useful for several applications such as intelligent web site maintenance and personalized web services.
    03/2006: pages 323-333;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: It has become a promising direction to measure similarity of Web search queries by mining the increasing amount of click-through data logged by Web search engines, which record the interactions between users and the search engines. Most existing approaches employ the click-through data for similarity measure of queries with little consideration of the temporal factor, while the click-through data is often dynamic and contains rich temporal information. In this paper we present a new framework of time-dependent query semantic similarity model on exploiting the temporal characteristics of historical click-through data. The intuition is that more accurate semantic similarity values between queries can be obtained by taking into account the timestamps of the log data. With a set of user-defined calendar schema and calendar patterns, our time-dependent query similarity model is constructed using the marginalized kernel technique, which can exploit both explicit similarity and implicit semantics from the click-through data effectively. Experimental results on a large set of click-through data acquired from a commercial search engine show that our time-dependent query similarity model is more accurate than the existing approaches. Moreover, we observe that our time-dependent query similarity model can, to some extent, reflect real-world semantics such as real-world events that are happening over time.
    Proceedings of the 15th international conference on World Wide Web, WWW 2006, Edinburgh, Scotland, UK, May 23-26, 2006; 01/2006
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recently, there is an increasing research efforts in XML data mining. These research efforts largely assumed that XML documents are static. However, in reality, the documents are rarely static. In this paper, we propose a novel research problem called XML structural delta mining. The objective of XML structural delta mining is to discover knowledge by analyzing structural evolution pattern (also called structural delta) of history of XML documents. Unlike existing approaches, XML structural delta mining focuses on the dynamic and temporal features of XML data. Furthermore, the data source for this novel mining technique is a sequence of historical versions of an XML document rather than a set of snapshot XML documents. Such mining technique can be useful in many applications such as change detection for very large XML documents, efficient XML indexing, XML search engine, etc. Our aim in this paper is not to provide a specific solution to a particular mining problem. Rather, we present the vision of the mining framework and present the issues and challenges for three types of XML structural delta mining: identifying various interesting structures, discovering association rules from structural deltas, and structural change pattern-based classification.
    Data & Knowledge Engineering. 01/2006;
  • Source
    Data Knowl. Eng. 01/2006; 59:627-651.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Previous efforts on event detection from the web have fo- cused primarily on web content and structure data ignoring the rich collection of web log data. In this paper, we propose the first approach to detect events from the click-through data, which is the log data of web search engines. The in- tuition behind event detection from click-through data is that such data is often event-driven and each event can be represented as a set of query-page pairs that are not only semantically similar but also have similar evolution pattern over time. Given the click-through data, in our proposed approach, we first segment it into a sequence of bipartite graphs based on the user-defined time granularity .N ext, the sequence of bipartite graphs is represented as a vector- based graph, which records the semantic and evolutionary relationships between queries and pages. After that, the vector-based graph is transformed into its dual graph ,w here each node is a query-page pair that will be used to represent real world events. Then, the problem of event detection is equivalent to the problem of clustering the dual graph of the vector-based graph. The clustering process is based on a two-phase graph cut algorithm. In the first phase, query- page pairs are clustered based on the semantic-based simi- larity such that each cluster in the result corresponds to a specific topic. In the second phase, query-page pairs related to the same topic are further clustered based on the evo- lution pattern-based similarity such that each cluster is ex- pected to represent a specific event under the specific topic. Experiments with real click-through data collected from a commercial web search engine show that the proposed ap- proach produces high quality results.
    Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, August 20-23, 2006; 01/2006
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Web structure mining has been a well-researched area during recent years. Based on the observation that data on the web may change at any time in any way, some incremental data mining algorithms have been proposed to update the mining results with the corresponding changes. However, noneof the existing web structure mining techniques is able to extract useful and hidden knowledge from the sequence of historical web structural changes. While the knowledge from snapshot is important and interesting, the knowledge behind the corresponding changes may be more critical and informative in some applications. In this paper, we propose a novel research area of web structure miningcalled web structural delta mining. The distinct feature of our research is that our mining objectis the sequence of historical changes of web structure (also called web structural deltas). For web structural delta mining, we aim to extract useful, interesting, and novel web structures and knowledge considering their historical, dynamic, and temporal properties.We propose three major issues of web structural delta mining, identifying useful and interesting structures, discovering associations from structural deltas, and structural change pattern based classifier. Moreover, we present a list of potential applications where the web structural delta mining results can be used.
    11/2005: pages 272-289;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Existing web usage mining techniques focus only on discovering knowledge based on the statistical measures obtained from the static characteristics of web usage data. They do not consider the dynamic nature of web usage data. In this paper, we focus on discovering novel knowledge by analyzing the change patterns of historical web access sequence data. We present an algorithm called WAM-MINER to discover Web Access Motifs (WAMs). WAMs are web access patterns that never change or do not change significantly most of the time (if not always) in terms of their support values during a specific time period. WAMs are useful for many applications, such as intelligent web advertisement, web site restructuring, business intelligence, and intelligent web caching.
    Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management, Bremen, Germany, October 31 - November 5, 2005; 01/2005
  • Source
    Qiankun Zhao, Sourav S. Bhowmick
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present a FASST mining approach to ex- tract the frequently changing semantic structures (FASSTs), which are a subset of semantic substructures that change frequently, from versions of unordered XML documents. We propose a data structure, H-DOM+, and a FASST mining algorithm, which incorporates the semantic issue and takes the advantage of the related domain knowledge. The distinct feature of this approach is that the FASST mining process is guided by the user-deflned concept hierarchy. Rather than mining all the frequent changing structures, only these frequent changing structures that are semantically meaningful are extracted. Our experimental results show that the H-DOM+ structure is compact and the FASST algorithm is ef- flcient with good scalability. We also design a declarative FASST query language, FASSTQUEL, to make the FASST mining process interactive and ∞exible.
    Database Systems for Advanced Applications, 10th International Conference, DASFAA 2005, Beijing, China, April 17-20, 2005, Proceedings; 01/2005
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Existing XML query pattern-based caching strategies focus on extracting the set of frequently issued query pattern trees based on the number of occurrences of the query pattern trees in the history. Each occurrence of the same query pattern tree is considered equally important for the caching strategy. However, the same query pattern tree may occur at different timepoints in the history of XML queries. This temporal feature can be used to improve the caching strategy. In this paper, we propose a novel type of query pattern called conserved query paths for efficient caching by integrating the support and temporal features together. Conserved query paths are paths in query pattern trees that never change or do not change significantly most of the time (if not always) in terms of their support values during a specific time period. We proposed an algorithm to extract those conserved query paths. By ranking those conserved query paths, a dynamic-conscious caching (DCC) strategy is proposed for efficient XML query processing. Experiments show that the DCC caching strategy outperforms the existing XML query pattern tree-based caching strategies.
    Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management, Bremen, Germany, October 31 - November 5, 2005; 01/2005

Publication Stats

302 Citations

Top Journals

Institutions

  • 2010
    • Telefónica I+D
      Madrid, Madrid, Spain
    • Teradata
      Dayton, Ohio, United States
  • 2006
    • University of Missouri
      • Department of Computer Science and IT
      Columbia, MO, United States
  • 2004–2006
    • Nanyang Technological University
      • School of Computer Engineering
      Tumasik, Singapore