Conference Paper

Ontology-Aware Classification and Association Rule Mining for Interest and Link Prediction in Social Networks.

Conference: Social Semantic Web: Where Web 2.0 Meets Web 3.0, Papers from the 2009 AAAI Spring Symposium, Technical Report SS-09-08, Stanford, California, USA, March 23-25, 2009
Source: DBLP


Previous work on analysis of friendship networks has identi- fied ways in which graph features can be used for prediction of link existence and persistence, and shown that features of user pairs such as shared interests can marginally improve the precision and recall of link prediction. This marginal improvement has, to date, been severely limited by the flat representation used for interest taxonomies. We present an approach towards integration of such graph features with on- tology-enriched numerical and nominal features (based on interest hierarchies) and on itemset size-sensitive associa- tions found using interest data. A test bed previously devel- oped using the social network and weblogging service Live- Journal is extended using this integrative approach. Our re- sults show how this semantically integrative approach to link mining yields a boost in precision and recall of known friendships when applied to this test bed. We conclude with a discussion of link-dependent features and how an integra- tive constructive induction framework can be extended to incorporate temporal fluents for link prediction, interest pre- diction, and annotation in social networks.

Download full-text


Available from: Waleed Aljandal,
  • Source
    • "A. Performance gain in precision, recall, and consistency of data mining results Many previous ontology-based efforts have reported performance gain in the data mining results. Ontology-based approaches have been reported to have better precision and recall than the traditional approaches in various tasks such as text clustering [32], [33], [35], [65], [82], information extraction [17], [27], [56], [57], link prediction [6], [15], [74], and recommendation systems [33], [52], [60], [61]. Research in recommendation system suggests that ontologybased recommendation systems have better prediction precision than traditional recommendation methods [13], [83]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Semantic Data Mining refers to the data mining tasks that systematically incorporate domain knowledge, especially formal semantics, into the process. In the past, many research efforts have attested the benefits of incorporating domain knowledge in data mining. At the same time, the proliferation of knowledge engineering has enriched the family of domain knowledge, especially formal semantics and Semantic Web ontologies. Ontology is an explicit specification of conceptualization and a formal way to define the semantics of knowledge and data. The formal structure of ontology makes it a nature way to encode domain knowledge for the data mining use. In this survey paper, we introduce general concepts of semantic data mining. We investigate why ontology has the potential to help semantic data mining and how formal semantics in ontologies can be incorporated into the data mining process. We provide detail discussions for the advances and state of art of ontology-based approaches and an introduction of approaches that are based on other form of knowledge representations.
    the 9th IEEE Conference on Semantic Computing (ICSC 2015); 02/2015
  • Source
    • "For example, in [10] the authors compare several data mining techniques, among which association rule mining, to discover user patterns from Facebook data. Differently, the classification and link prediction method presented in [13] exploits association rules to discover correlations among data features related to the major user interests. However, to the best of our knowledge, the discovery of generalized rules in the presence of taxonomies constructed over both the tweet content and context of publication has never been investigated so far. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The increasing availability of user-generated content coming from online communities allows the analysis of common user behaviors and trends in social network usage. This paper presents the TweM Tweet Miner framework that entails the discovery of hidden and high level correlations, in the form of generalized association rules, among the content and the contextual features of posts published on Twitter i.e., the tweets. To effectively support knowledge discovery from tweets, the TweM framework performs two main steps: i taxonomy generation over tweet keywords and context data and ii generalized association rule mining, driven by the generated taxonomy, from a sequence of tweet collections. Unlike traditional mining approaches, the generalized rule mining session performed on the current tweet collection also considers the evolution of the extracted patterns across the sequence of the previous mining sessions to prevent the discarding of rare knowledge that frequently occurs in a number of past extractions. Experiments, performed on both real Twitter posts and synthetic datasets, show the effectiveness and the efficiency of the proposed TweM framework in supporting knowledge discovery from Twitter user-generated content.
    Intelligent Data Analysis 01/2013; 17(4-4). DOI:10.3233/IDA-130597 · 0.61 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Association rule learning is a data mining technique that can capture relationships between pairs of entities in different domains. The goal of this research is to discover factors from data that can improve the precision, recall, and accuracy of association rules found using interestingness measures and frequent itemset mining. Such factors can be calibrated using validation data and applied to rank candidate rules in domain-dependent tasks such as link existence prediction. In addition, I use interestingness measures themselves as numerical features to improve link existence prediction. The focus of this dissertation is on developing and testing an analytical framework for association rule interestingness measures, to make them sensitive to the relative size of itemsets. I survey existing interestingness measures and then introduce adaptive parametric models for normalizing and optimizing these measures, based on the size of itemsets containing a candidate pair of co-occurring entities. The central thesis of this work is that in certain domains, the link strength between entities is related to the rarity of their shared memberships (i.e., the size of itemsets in which they co-occur), and that a data-driven approach can capture such properties by normalizing the quantitative measures used to rank associations. To test this hypothesis under different levels of variability in itemset size, I develop several test bed domains, each containing an association rule mining task and a link existence prediction task. The definitions of itemset membership and link existence in each domain depend on its local semantics. My primary goals are: to capture quantitative aspects of these local semantics in normalization factors for association rule interestingness measures; to represent these factors as quantitative features for link existence prediction, to apply them to significantly improve precision and recall in several real-world domains; and to build an experimental framework for measuring this improvement, using information theory and classification-based validation. Doctor of Philosophy Doctoral Department of Computing and Information Sciences William H. Hsu
Show more