Relevance Modeling for Microblog Summarization.

Conference Paper · January 2011with34 Reads
Source: DBLP
Conference: Proceedings of the Fifth International Conference on Weblogs and Social Media, Barcelona, Catalonia, Spain, July 17-21, 2011
    • "The increasing popularity of microblog platforms results in a huge volume of user-generated short posts. Automatically modeling topics out of such massive microblog posts can uncover the hidden semantic structures of the underlying collection and can be useful to downstream applications such as microblog summarization (Harabagiu and Hickl, 2011), user profiling (Weng et al., 2010), event tracking (Lin et al., 2010) and so on. Popular topic models, like Probabilistic Latent Semantic Analysis (pLSA) (Hofmann, 1999) and Latent Dirichlet Allocation (LDA) (Blei et al., 2003b ), model the semantic relationships between words based on their co-occurrences in documents . "
    [Show abstract] [Hide abstract] ABSTRACT: Conventional topic models are ineffective for topic extraction from microblog messages since the lack of structure and context among the posts renders poor message-level word co-occurrence patterns. In this work, we organize microblog posts as conversation trees based on re-posting and replying relations, which enrich context information to alleviate data sparseness. Our model generates words according to topic dependencies derived from the conversation structures. In specific , we differentiate messages as leader messages, which initiate key aspects of previously focused topics or shift the focus to different topics, and follower messages that do not introduce any new information but simply echo topics from the messages that they repost or reply. Our model captures the different extents that leader and follower messages may contain the key topical words, thus further enhances the quality of the induced topics. The results of thorough experiments demonstrate the effectiveness of our proposed model.
    Full-text · Conference Paper · Aug 2016 · Journal of the Association for Information Science and Technology
    Jing LiJing LiMing LiaoMing LiaoWei GaoWei Gao+1more author...[...]
    • "A general consensus on extractive summarization is that both relevance and coverage are critical issues in a realistic scenario [9][10][11][12][13]. However, most of the existing summarization methods focus on determining only the relevance degree between a given document and one of its sentences [14][15][16][17][18]. As a result, the top-ranked sentences returned by these methods may only cover partial subthemes of the given document and fail to interpret the whole picture. "
    [Show abstract] [Hide abstract] ABSTRACT: Extractive summarization aims at selecting a set of indicative sentences from a source document as a summary that can express the major theme of the document. A general consensus on extractive summarization is that both relevance and coverage are critical issues to address. The existing methods designed to model coverage can be characterized by either reducing redundancy or increasing diversity in the summary. Maximal margin relevance (MMR) is a widely-cited method since it takes both relevance and redundancy into account when generating a summary for a given document. In addition to MMR, there is only a dearth of research concentrating on reducing redundancy or increasing diversity for the spoken document summarization task, as far as we are aware. Motivated by these observations, two major contributions are presented in this paper. First, in contrast to MMR, which considers coverage by reducing redundancy, we propose two novel coverage-based methods, which directly increase diversity. With the proposed methods, a set of representative sentences, which not only are relevant to the given document but also cover most of the important sub-themes of the document, can be selected automatically. Second, we make a step forward to plug in several document/sentence representation methods into the proposed framework to further enhance the summarization performance. A series of empirical evaluations demonstrate the effectiveness of our proposed methods.
    Full-text · Article · Jan 2016 · Journal of the Association for Information Science and Technology
    • "This observation raises the question of whether there are other ways in which use of favoriting differs between groups of Twitter users, and how this impacts other dimensions of Twitter use. Moreover, being able to distinguish automatically between different usages of favorites will improve the quality of user models derived automatically from UGM (e.g., Abel, Gao, Houben, & Tao, 2011a; Angeletou, Rowe, & Alani, 2011), as well as the performance of methods for personalized tweet recommendation (e.g., Abel et al., 2011b; Chen et al., 2010; Chen, Nairn, & Chi, 2011), and tweet summarization (e.g., Harabagiu & Hickl, 2011; Yan et al., 2012). The first contribution of this work lies in identifying five categories of favorites usage (i.e., like, bookmark, thanks, conversational, and self-promotion), three of which have not been studied in related work. "
    [Show abstract] [Hide abstract] ABSTRACT: Since its foundation in 2006, Twitter has enjoyed a meteoric rise in popularity, currently boasting over 500 million users. Its short text nature means that the service is open to a variety of different usage patterns, which have evolved rapidly in terms of user base and utilization. Prior work has categorized Twitter users, as well as studied the use of lists and re-tweets and how these can be used to infer user profiles and interests. The focus of this article is on studying why and how Twitter users mark tweets as “favorites”—a functionality with currently poorly understood usage, but strong relevance for personalization and information access applications. Firstly, manual analysis and classification are carried out on a randomly chosen set of favorited tweets, which reveal different approaches to using this functionality (i.e., bookmarks, thanks, like, conversational, and self-promotion). Secondly, an automatic favorites classification approach is proposed, based on the categories established in the previous step. Our machine learning experiments demonstrate a high degree of success in matching human judgments in classifying favorites according to usage type. In conclusion, we discuss the purposes to which these data could be put, in the context of identifying users' patterns of interests.
    Full-text · Article · Jan 2015
Show more