Conference Paper

Social context summarization.

DOI: 10.1145/2009916.2009954 Conference: Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, China, July 25-29, 2011
Source: DBLP

ABSTRACT We study a novel problem of social context summarization for Web documents. Traditional summarization research has focused on extracting informative sentences from standard documents. With the rapid growth of online social networks, abundant user generated content (e.g., comments) associated with the standard documents is available. Which parts in a document are social users really caring about? How can we generate summaries for standard documents by considering both the informativeness of sentences and interests of social users? This paper explores such an approach by modeling Web documents and social contexts into a unified framework. We propose a dual wing factor graph (DWFG) model, which utilizes the mutual reinforcement between Web documents and their associated social contexts to generate summaries. An efficient algorithm is designed to learn the proposed factor graph model.Experimental results on a Twitter data set validate the effectiveness of the proposed model. By leveraging the social context information, our approach obtains significant improvement (averagely +5.0%-17.3%) over several alternative methods (CRF, SVM, LR, PR, and DocLead) on the performance of summarization.

Download full-text


Available from: Zhong Su, Aug 23, 2015
  • Source
    • "Intuitively, the more often some part of the story is tweeted, the more salient it might be. Previous work assumed that such socially focused sentences might be closely related to the reference summary [18] [3] [16]. However, there are some important questions left unanswered. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Single-document summarization is a challenging task. In this paper, we explore effective ways using the tweets linking to news for generating extractive summary of each document. We reveal the very basic value of tweets that can be utilized by regarding every tweet as a vote for candidate sentences. Base on such finding, we resort to unsupervised summarization models by leveraging the linking tweets to master the ranking of candidate extracts via random walk on a heterogeneous graph. The advantage is that we can use the linking tweets to opportunistically " supervise " the summa-rization with no need of reference summaries. Furthermore, we analyze the influence of the volume and latency of tweets on the quality of output summaries since tweets come after news release. Compared to truly supervised summarizer unaware of tweets, our method achieves significantly better results with reasonably small tradeoff on latency; compared to the same using tweets as auxiliary features, our method is comparable while needing less tweets and much shorter time to achieve significant outperformance.
    The 38th Annual ACM SIGIR Conference on Research and Development in Information Retrieval, Sandiago, Chile; 08/2015
  • Source
    • "In addition, different types of features have been used, including lexical, acoustic and structural characteristics (Xie et al., 2008; Maskey and Hirschberg, 2005). Recent works have been focused on adapting summarization to the social context, exploiting user generated contents associated with the documents (Yang et al., 2011; Hu et al., 2012). Implicit and explicit community feedback in online collaborative websites have also been leveraged to detect highlights of media assets (San Pedro et al., 2009). "
    [Show abstract] [Hide abstract]
    ABSTRACT: This papers presents a context-aware NLP approach to automatically detect noteworthy information in spontaneous mobile phone conversations. The proposed method uses a supervised modeling strategy which considers both features from the content of the conversation as well as contextual information from the call. We empirically analyze the predictive performance of features of different nature on a corpus of mobile phone conversations. The results of this study reveal that the context of the conversation plays a crucial role on boosting the predictive performance of the model.
    COLING, Dublin,Ireland; 01/2014
  • Source
    • "For example, in [22] the graph nodes represent single keywords, which are indexed by the HITS algorithm [20]. A similar approach has been adopted in [45] [46] to address Web page summarization driven by the user-generated content coming from social networks. Unlike all of the above-mentioned approaches , our summarizer discovers association rules from the analyzed document to also represent the correlations among multiple terms in the graph-based model. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Graph-based summarization entails extracting a worthwhile subset of sentences from a collection of textual documents by using a graph-based model to represent the correlations between pairs of document terms. However, since the high-order correlations among multiple terms are disregarded during graph evaluation, the summarization performance could be limited unless integrating ad hoc language-dependent or semantics-based analysis. This paper presents a novel and general-purpose graph-based summarizer, namely GraphSum (Graph-based Summarizer). It discovers and exploits association rules to represent the correlations among multiple terms that have been neglected by previous approaches. The graph nodes, which represent combinations of two or more terms, are first ranked by means of a PageRank strategy that discriminates between positive and negative term correlations. Then, the produced node ranking is used to drive the sentence selection process. The experiments performed on benchmark and real-life documents demonstrate the effectiveness of the proposed approach compared to many state-of-the-art summarizers.
    Information Sciences 07/2013; 249. DOI:10.1016/j.ins.2013.06.046 · 3.89 Impact Factor
Show more