Relevance Modeling for Microblog Summarization.
Conference: Proceedings of the Fifth International Conference on Weblogs and Social Media, Barcelona, Catalonia, Spain, July 17-21, 2011
Conference Paper: Sumblr: continuous summarization of evolving tweet streams[Show abstract] [Hide abstract]
ABSTRACT: With the explosive growth of microblogging services, short-text messages (also known as tweets) are being created and shared at an unprecedented rate. Tweets in its raw form can be incredibly informative, but also overwhelming. For both end-users and data analysts it is a nightmare to plow through millions of tweets which contain enormous noises and redundancies. In this paper, we study continuous tweet summarization as a solution to address this problem. While traditional document summarization methods focus on static and small-scale data, we aim to deal with dynamic, quickly arriving, and large-scale tweet streams. We propose a novel prototype called Sumblr (SUMmarization By stream cLusteRing) for tweet streams. We first propose an online tweet stream clustering algorithm to cluster tweets and maintain distilled statistics called Tweet Cluster Vectors. Then we develop a TCV-Rank summarization technique for generating online summaries and historical summaries of arbitrary time durations. Finally, we describe a topic evolvement detection method, which consumes online and historical summaries to produce timelines automatically from tweet streams. Our experiments on large-scale real tweets demonstrate the efficiency and effectiveness of our approach.
- [Show abstract] [Hide abstract]
ABSTRACT: Online social media exhibits massive social event relevant messages. Some of them contain useful and meaningful information, while others might not worth reading. In this paper, for a given social event, we focus on extracting high quality information from massive social media messages, since the extracted information has valuable textual content, and is widely propagated and posted by authority. We propose an extraction framework to get high quality information by considering different features globally in social media. Specially, in order to reduce computing time and improve extraction precision, some important social media features are employed and transformed into wavelet domain and fused further, to get a weighted ensemble value. A large scale of Sina microblog dataset is used to evaluate the framework’s performance. Experimental results show that the proposed framework is effective to extract high quality information.
- [Show abstract] [Hide abstract]
ABSTRACT: As an information delivering platform, Twitter collects millions of tweets every day. However, some users, especially new users, often find it difficult to understand trending topics in Twitter when confronting the overwhelming and unorganized tweets. Existing work has attempted to provide a short snippet to explain a topic, but this only provides limited benefits and cannot satisfy the users' expectations. In this paper, we propose a new summarization task, namely sequential summarization, which aims to provide a serial of chronologically ordered short sub-summaries for a trending topic in order to provide a complete story about the development of the topic while retaining the order of information presentation. Different from the traditional summarization task, the numbers of sub-summaries for different topics are not fixed. Two approaches, i.e., stream-based and semantic-based approaches, are developed to detect the important subtopics within a trending topic. Then a short sub-summary is generated for each subtopic. In addition, we propose three new measures to evaluate the position-aware coverage, sequential novelty and sequence correlation of the system-generated summaries. The experimental results based on the proposed evaluation criteria have demonstrated the effectiveness of the proposed approaches.
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.