Conference Paper

Relevance Modeling for Microblog Summarization.

Conference: Proceedings of the Fifth International Conference on Weblogs and Social Media, Barcelona, Catalonia, Spain, July 17-21, 2011
Source: DBLP
  • [Show abstract] [Hide abstract]
    ABSTRACT: As an information delivering platform, Twitter collects millions of tweets every day. However, some users, especially new users, often find it difficult to understand trending topics in Twitter when confronting the overwhelming and unorganized tweets. Existing work has attempted to provide a short snippet to explain a topic, but this only provides limited benefits and cannot satisfy the users' expectations. In this paper, we propose a new summarization task, namely sequential summarization, which aims to provide a serial of chronologically ordered short sub-summaries for a trending topic in order to provide a complete story about the development of the topic while retaining the order of information presentation. Different from the traditional summarization task, the numbers of sub-summaries for different topics are not fixed. Two approaches, i.e., stream-based and semantic-based approaches, are developed to detect the important subtopics within a trending topic. Then a short sub-summary is generated for each subtopic. In addition, we propose three new measures to evaluate the position-aware coverage, sequential novelty and sequence correlation of the system-generated summaries. The experimental results based on the proposed evaluation criteria have demonstrated the effectiveness of the proposed approaches.
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on. 02/2014; 22(2):293-302.
  • [Show abstract] [Hide abstract]
    ABSTRACT: There is an abundance of information found on microblog services due to their popularity. However the potential of this trove of information is limited by the lack of effective means for users to browse and interpret the numerous messages found on these services. We tackle this problem using a two-step process, first by slicing up the search results of current retrieval systems along multiple possible genres. Then, a summary is generated from the microblog messages attributed to each genre. We believe that this helps users to better understand the possible interpretations of the retrieved results and aid them in finding the information that they need. Our novel approach makes use of automatically acquired information from external search engines in each of these two steps. We first integrate this information with a semi-supervised probabilistic graphical model, and show that this helps us to achieve significantly better classification performance without the need for much training data. Next we incorporate the extra information into graph-based summarization, and demonstrate that superior summaries (up to 30% improvement in ROUGE-1) are obtained.
    Neurocomputing 03/2015; 152. · 2.01 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Since its foundation in 2006, Twitter has enjoyed a meteoric rise in popularity, currently boasting over 500 million users. Its short text nature means that the service is open to a variety of different usage patterns, which have evolved rapidly in terms of user base and utilization. Prior work has categorized Twitter users, as well as studied the use of lists and re-tweets and how these can be used to infer user profiles and interests. The focus of this article is on studying why and how Twitter users mark tweets as “favorites”—a functionality with currently poorly understood usage, but strong relevance for personalization and information access applications. Firstly, manual analysis and classification are carried out on a randomly chosen set of favorited tweets, which reveal different approaches to using this functionality (i.e., bookmarks, thanks, like, conversational, and self-promotion). Secondly, an automatic favorites classification approach is proposed, based on the categories established in the previous step. Our machine learning experiments demonstrate a high degree of success in matching human judgments in classifying favorites according to usage type. In conclusion, we discuss the purposes to which these data could be put, in the context of identifying users' patterns of interests.
    Journal of the Association for Information Science and Technology 01/2015; · 2.23 Impact Factor