Conference Paper

A finegrained digestion of news webpages through Event Snippet Extraction

DOI: 10.1145/1963192.1963272 Conference: Proceedings of the 20th International Conference on World Wide Web, WWW 2011, Hyderabad, India, March 28 - April 1, 2011 (Companion Volume)
Source: DBLP


We describe a framework to digest news webpages in finer granularity: to extract event snippets from contexts. "Events" are atomic text snippets and a news article is constituted by more than one event snippet. Event Snippet Extraction (ESE) aims to mine these snippets out. The problem is important because its solutions may be applied to many information mining and retrieval tasks. The challenge is to exploit rich features to detect snippet boundaries, including various semantic, syntactic and visual features. We run experiments to present the effectiveness of our approaches.

Download full-text


Available from: Yu Li, Jul 01, 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We investigate an important and challenging problem in summary generation, i.e., Evolutionary Trans-Temporal Summarization (ETTS), which generates news timelines from massive data on the Internet. ETTS greatly facilitates fast news browsing and knowledge comprehension, and hence is a necessity. Given the collection of time-stamped web documents related to the evolving news, ETTS aims to return news evolution along the timeline, consisting of individual but correlated summaries on each date. Existing summarization algorithms fail to utilize trans-temporal characteristics among these component summaries. We propose to model trans-temporal correlations among component summaries for timelines, using inter-date and intra-date sentence dependencies, and present a novel combination. We develop experimental systems to compare 5 rival algorithms on 6 instinctively different datasets which amount to 10251 documents. Evaluation results in ROUGE metrics indicate the effectiveness of the proposed approach based on trans-temporal information.
    Preview · Conference Paper · Jan 2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: The future-related information mining task for online web resources such as news articles and blogs has been getting more attention due to its potential usefulness in supporting individual's decision making in a world where massive new data are generated daily. Instead of building a data-driven model to predict the future, one extracts future events from these massive data with high probability that they occur at a future time and a specific geographic location. Such spatiotemporal future events can be utilized by a recommender system on a location-aware device to provide localized future event suggestions. In this paper, we describe a systematic approach for mining future spatiotemporal events from web; in particular, news articles. In our application context, a valid event is defined both spatially and temporally. The mining procedure consists of two main steps: recognition and matching. For the recognition step, we identify and resolve toponyms (geographic location) and future temporal patterns. In the matching step, we perform spatiotemporal disambiguation, de-duplication, and pairing. To provide more useful future event guidance, we attach to each event a sentiment linguistic variable: positive, negative, or neutral, so that one may use these extracted event information for recommendation purposes in the form of "avoid Event A" or "avoid geographic location L at time T" or "attend Event B" based on the event sentiment. The identified future event consists of its geographic location, temporal pattern, sentiment variable, news title, key phrase, and news article URL. Experimental results on 3652 news articles from 21 online new sources collected over a 2-week period in the Greater Washington area are used to illustrate some of the critical steps in our mining procedure.
    No preview · Conference Paper · Nov 2012