Conference Paper

Fractal summarization for mobile devices to access large documents on the web

Authors:
  • Hong Kong Metropolitan University
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Two main approaches have already been proposed in the literature. First, some methodologies such as [2] [14] use simple but fast summarization techniques to produce results in real-time. However, they show low quality contents for visualization as they do not linguistically process the web pages. ...
... Then, multiword units which respect the following regular expression are selected for quality content visualization:[Noun Noun* | Adjective Noun* | Noun Preposition Noun | Verb Adverb]. This technique is usual in the field of Terminology [14]. A good example can be seen in Figure 1 where the multiword unit "Web Services" is detected, where existing solutions [2][7] [14] would at most consider both words "Web" and "Services" separately. ...
... This technique is usual in the field of Terminology [14]. A good example can be seen in Figure 1 where the multiword unit "Web Services" is detected, where existing solutions [2][7] [14] would at most consider both words "Web" and "Services" separately. Finally, we remove all stop words present in the STU. ...
Article
Information Systems can help the peoples by providing many varieties of information's to peoples. We can access the information's anywhere by using the World Wide Web. But the information's can't be viewed by the blind peoples. Due to their inability in accessing information from written text documents, blind people face tremendous difficulties in accessing information through web.In this paper, we propose an automatic summarization system to ease web browsing for visually impaired people on handheld devices. In particular, we propose a new architecture for summarizing Semantic Textual Units [2] based on efficient algorithms for linguistic treatment [3][6] which allow real time processing and deeper linguistic analysis of web pages, thus allowing quality content visualization. Moreover, we present a text-to-speech interface to ease the understanding of web pages content. To our knowledge, this is the first attempt to use both statistical and linguistic techniques for text summarization for browsing on mobile devices.
... We adapt one of them for calculating the fractal dimension to the case of text documents, and we present some results for web pages. We follow talking about the drawbacks of the fractal summarization model [14,15], and we propose some changes in the model using the fractal dimension of a text document giving a better working of fractal summarization. In the last section we talk about the conclusions obtained. ...
... Fractal summarization model was proposed by Yang and Wang [14,15] to generate a summary based on document structure. Fractal summarization is developed based on the idea of fractal view [8] and adapting the traditional models of automatic extraction. ...
... We use that method but we formulate it using the division that structure provides. The formula used in [15,14] has some shortcomings. They take D = 1. ...
Article
The calculation of dimensions is a useful tool to quantify structural information of artificial and natural objects. There are some types of dimension (9): the Euclidean one, the Hausdorff-Besicovitch dimension, and so on. We are going to work with the fractal dimension in the special case of text documents. There are many objects which fractal dimension cannot be determined analytically, but there exits estimators for those cases. We review some of them and choose the best for our purpose: the calculation of fractal dimension of text documents. Every day we search new information in the web, and we found a lot of documents which contain pages with a great amount of information. There is a big demand for automatic summarization in a rapid and precise way. Many methods have been used in automatic extraction but most of them do not take into account the hierarchical structure of the documents. A novel method using the structure of the document was introduced by Yang and Wang (15). It is based in a fractal view method for controlling the information. It has some drawbacks and we solve them doing a new adaptation of the fractal view method. We also use the new concept of fractal dimension of a text document to achieve a better diversification of the extracted sentences.
... From another perspective, most of the related work ignores document structure. As an attempt to break this limitation, some structure-based summarization approaches have been proposed [13,14]. Alam et al. [13] propose an approach for summarizing Web documents by making use of the "table of content"-like hierarchy of a document, including sections and subsections. ...
... Alam et al. [13] propose an approach for summarizing Web documents by making use of the "table of content"-like hierarchy of a document, including sections and subsections. Also, in [14], the summarization method is based on document structure where a document is considered as consisting of multiple levels as chapters, sections, subsections, paragraphs, sentences and terms. In both of these works, the aim is to create general-purpose summaries. ...
... The intuition behind heading method is that headings in a document usually include key words related to the document content (e.g. [14], [17]). For this purpose, the sentences are assigned a heading score based on the number and frequency of the words appearing in a heading of the document. ...
Conference Paper
Full-text available
With the drastic increase of available information sources on the Internet, people with different backgrounds share the same problem: locating useful information for their actual needs. Search engines make this task easier only in certain ways; people still have to do the sifting process by themselves. At this point, automatic summarization can complement the task of search engines. In this paper, we consider a new summarization approach for Web information retrieval; i.e. structure-preserving and query-biased summarization. We evaluate this approach on Turkish Web documents using TREC-like topics defined for Turkish. The results of the task-based evaluation show that this approach has significant improvement over Google snippets and unstructured query-biased summaries in terms of f-measure using the relevance prediction approach.
... However, a large document exhibits totally different characteris-tics from web pages. A web page usually contains a small number of sentences that are organized into paragraphs, but a large document contains much more sentences that are organized into a more complex hierarchical structure [48,49]. Besides, the summarization on webpage is mainly based on thematic features only [6]. ...
... However, it has been proved that other document features play a role as important as the thematic feature [10,22]. Therefore, a more advance summarization model combined with other document features is required for browsing of large document and other information sources on handheld devices [49][50][51]. With powerful summarization tool, the ability of handheld devices will be greatly enhanced. ...
Article
Wireless access with handheld devices is a promising addition to the WWW and traditional electronic business. Handheld devices provide convenience and portable access to the huge information space on the Internet without requiring users to be stationary with network connection. Many customer-centered m-services applications have been developed. The mobile computing, however, should be extended to decision support in an organization. There is a desire of accessing most update and accurate information on handheld devices for fast decision making in an organization. Unfortunately, loading and visualizing large documents on handheld devices are impossible due to their shortcomings. In this paper, we introduce the fractal summarization model for document summarization on handheld devices. Fractal summarization is developed based on the fractal theory. It generates a brief skeleton of summary at the first stage, and the details of the summary on different levels of the document are generated on demands of users. Such interactive summarization reduces the computation load in comparing with the generation of the entire summary in one batch by the traditional automatic summarization, which is ideal for wireless access. The three-tier architecture with the middle-tier conducting the major computation is also discussed. Visualization of summary on handheld devices is also investigated. The automatic summarization, the three-tier architecture, and the information visualization are potential solutions to the existing problems in information delivery to handheld devices for mobile commerce.
... Personalised summarisation offers the possibility to present the most salient facets of information to a visitor, according to their interests and preferences. The pragmatic choice of a mobile device such as a smart phone to deliver the content poses challenges in terms of the amount of content that can be effectively presented to the visitor (Yang and Wang, 2003;Otterbacher et al., 2006). ...
Conference Paper
Full-text available
This paper presents an experiment on in situ summarisation in a museum context. We implement a range of standard summarisation algorithms, and use them to generate summaries for individual exhibit areas in a museum, intended for in situ delivery to a museum visitor on a mobile device. Personalisation is relative to a visitor's preference for summary length, the visitor's relative interest in a given exhibit topic, as well as (optionally) the summary history. We find that the best-performing summarisation strategy is the Centroid algorithm, and that content diversification and customisation of summary length have a significant impact on user ratings of summary quality. © 2011 by Timothy Baldwin, Patrick Ye, Fabian Bohnert, and Ingrid Zukerman.
... Besides the overview+detail method, one approach is to simply eliminate some of the content without offering any possibility to view the page in its original form [48,57,138,34,133]. By using this method, the layout and the content of a Web page are modified for good and a user is not able to view the Web page as he would on a PC. ...
... Web technologies have the potential to play the same role for Internet access from mobile devices (Chae and Kim, 2003). Today however, mobile web access suffers from interoperability and usability problems that make the web difficult to access for most users (Chae and Kim, 2004;Yang and Wang, 2003;Cui and Roto, 2008;Oulasvirta et al., 2005;Roto and Oulasvirta, 2005;Brewster, 2002a,b;Wobbrock, 2006). W3C's ''Mobile Web Initiative'' (MWI) 6 proposes to address these issues through a concerted effort of key players in the mobile production chain, including authoring tool vendors, content providers, handset manufacturers, browser vendors and Mobile operators. ...
Article
World Wide Web accessibility and best practice audits and evaluations are becoming increasingly complicated, time consuming, and costly because of the increasing number of conformance criteria which need to be tested. In the case of web access by disabled users and mobile users, a number of commonalities have been identified in usage, which have been termed situationally-induced impairments; in effect the barriers experienced by mobile web users have been likened to those of visually disabled and motor impaired users. In this case, we became interested in understanding if it was possible to evaluate the problems of mobile web users in terms of the aggregation of barriers-to-access experienced by disabled users; and in this way attempt to reduce the need for the evaluation of the additional conformance criteria associated with mobile web best practice guidelines. We used the Barrier Walkthrough (BW) method as our analytical framework. Capable of being used to evaluate accessibility in both the disabled and mobile contexts, the BW method would also enable testing and aggregation of barriers across our target user groups. We tested 61 barriers across four user groups each over four pages with 19 experts and 57 non-experts focusing on the validity and reliability of our results. We found that 58% of the barrier types that were correctly found were identified as common between mobile and disabled users. Further, if our aggregated barriers alone were used to test for mobile conformance only four barrier types would be missed. Our results also showed that mobile users and low vision users have the most common barrier types, while low vision and motor impaired users experiencing similar rates of severity in the barriers they experienced. We conclude that the aggregated evaluation results for blind, low vision and motor impaired users can be used to approximate the evaluation results for mobile web users.
... The next problem is information estimating from different sources [15][16]. For semi-structured data type text file with a known formatdictionary data types defined formatting released the text of the formatting, copying its contents: ...
Data
Full-text available
... The output of this step will be used for sentence ranking to select the portion of the webpage which deals with the main topic. Yang and Wang [37] presented a fractal theory based mathematical way for summarizing a Web page into a tree structure, and displayed the summary to the handheld devices through cards in WML. Users may browse the selected summary by clicking the anchor links from the highest abstraction level to the lowest abstraction level. ...
Conference Paper
Full-text available
Nowadays the usage of mobile phones is widely spread in our daily life. We use mobile phones as a camera, radio, music player, and even an internet browser. As most Web pages were originally designed for desktop computers with large screens, viewing them on smaller displays involves a number of horizontal and vertical page scrolling. To save mobile Web search fatigue caused by repeated scrolling, we investigate the automatic Web page scrolling problem based on two observations. First, every web page has many different parts that do not have the equal importance to an end user, and the user is often interested in a certain part of the Web page. Second, the ease of use of text-entry in mobile phones compare to the desktop computers', users usually prefer to search the Web just once and get the needed answer. Compared to the existing efforts on page layout modification and content splitting for easy page navigation on mobile displays, we present a simple yet effective approach of automatic page scrolling for mobile Web search, while keeping the original Web page content keeps its integrity and hence, preventing any loss of information. We work with the Document Object Model (DOM) of the clicked page by user, compute the relevance of each paragraph of the Web page based on the tf*idf (term frequency*inverse document frequency) values of user's search keywords occurring in that paragraph. the focus of the browser will be automatically scrolled to the most relevant one. Our user study shows that the proposed approach can achieve 96.47% scrolling accuracy under one search keyword, and 94.78% under multiple search keywords, while the time spending in computing the most important part does not vary much from the number of search keywords. The users can save up to 1.5 sec in searching and finding the needed information compare to the best case of our user study.
... A mechanism to allow early termination of transmission of irrelevant documents is needed. In [21], document summarization is performed on individual documents so as to return a smaller amount of overview information before the relatively large document is actually retrieved. Since the summary is not reusable when the full document is transmitted, extra bandwidth is consumed when the document is really useful. ...
Article
Full-text available
Mobile environments are characterized by low communication bandwidth and frequent disconnection. Conventional information retrieval and visualization mechanisms thus pose a serious challenge to mobile clients. There is a need for these clients to quickly perceive an overall picture of the information available to them, so as to enable them to discontinue the transmission of information units that are unlikely useful to them. We had proposed a multi-resolution transmission mechanism for web documents. In particular, various organizational units of a document are transmitted to a mobile client in an order according to their information content, thereby allowing the client to terminate the transmission of a useless document at an earlier moment. In this paper, we generalize the multi-resolution transmission model for a document, and then extend that model into the multi-resolution transmission framework to cater for not only units within a document, but also for a collection of documents. We refer to the multi-resolution transmission mechanism for a particular document as intra-document multi-resolution transmission mechanism and the extension to a document cluster as inter-document multi-resolution transmission mechanism. With the integrated multi-resolution transmission framework, a mobile client can examine the important portions of the document cluster for an early grasp of the information therein, with the most important contents for each of those documents more readily available as well.
... One way to reduce user wait time in web browsing is to break larger web pages into smaller pages. [1][2] [3]. While web page segmentation techniques generally reduce user wait time for each page-load request as pages now become smaller, they may not reduce the total amount of user wait time for a given task, as reducing the sizes of pages likely also increases the number of navigation steps a user must take to complete a task. ...
Conference Paper
This paper describes the design and prototype implementation of MIntOS (Mobile Interaction Optimization System), a system for improving mobile interaction in web-based activities. MIntOS monitors users' interactions both for gathering interaction history and for the runtime construction of interaction context. A simple approach based on interaction burstiness is used to break interaction sequences into Trails, which approximates user tasks. Such Trails are used to generate rules for online, context-sensitive prediction of future interaction sequences. Predicted user interaction sequences are then optimized to reduce the amount of user input and user wait time using techniques such as interaction short-cuts, automatic text copying and form-filling, as well as page pre-fetching. Such optimized interaction sequences are, at real-time, recommended to the user through UI enhancements in a non-intrusive manner.
... However, their work is built on old well known techniques for text summarization and do not introduce linguistic processing (except stemming) to remain real-time adaptable as processing is handled by the mobile device. In order to introduce more knowledge compared to the previous model, ( Yang and Wang, 2003) propose a fractal summarization model based on statistical and structure analysis of web pages. Thus, thematic features, location features, heading features, and cue features are adopted. ...
Conference Paper
Full-text available
In this paper, we propose a universal solution to w eb search and web browsing on handheld devices for visually impaired people. For this purpose, we propose (1) t o automatically cluster web page results and (2) to s ummarize all the information in web pages so that speech-to- speech interaction is used efficiently to access informati on.
... The structural tags were then leveraged to generate summaries at different compression budgets. Physical structure of the document was also leveraged by Yang et al. [34] to construct summaries for mobile devices. Multi-level summaries were constructed by iteratively adding finer details to a skeleton summary. ...
Preprint
Summarizing a document within an allocated budget while maintaining its major concepts is a challenging task. If the budget can take any arbitrary value and not known beforehand, it becomes even more difficult. Most of the existing methods for abstractive summarization, including state-of-the-art neural networks are data intensive. If the number of available training samples becomes limited, they fail to construct high-quality summaries. We propose MLS, an end-to-end framework to generate abstractive summaries with limited training data at arbitrary compression budgets. MLS employs a pair of supervised sequence-to-sequence networks. The first network called the \textit{MFS-Net} constructs a minimal feasible summary by identifying the key concepts of the input document. The second network called the Pointer-Magnifier then generates the final summary from the minimal feasible summary by leveraging an interpretable multi-headed attention model. Experiments on two cross-domain datasets show that MLS outperforms baseline methods over a range of success metrics including ROUGE and METEOR. We observed an improvement of approximately 4% in both metrics over the state-of-art convolutional network at lower budgets. Results from a human evaluation study also establish the effectiveness of MLS in generating complete coherent summaries at arbitrary compression budgets.
... The output of this step will be used for sentence ranking to select the portion of the webpage which deals with the main topic. Yang and Wang [37] presented a fractal theory based mathematical way for summarizing a Web page into a tree structure, and displayed the summary to the handheld devices through cards in WML. Users may browse the selected summary by clicking the anchor links from the highest abstraction level to the lowest abstraction level. ...
Conference Paper
Full-text available
This paper proposes a Research paper Similarity system that measures the similarity of an input paper with other papers based on the summarized version of each paper. Currently, This system will take into account 2 different types of summarization for papers based on the different types of keywords,i.e, Normal keywords and Stemmed keywords. On the contrast to the current and existing recommendation systems for research papers that are using citation and/or Page Rank data, our system works dependent from them but dependent to the textual content of the paper. Our experiment, which was conducting regarding to one of the citation-based papers' recommendation systems, Google Scholar, as a baseline, shows that citation-based systems like Google scholar are very vulnerable to ignore more related but less cited papers while systems based on the textual value of papers can be more successful to recommend papers that are more similar to the input paper. However, comparing full-textual content of papers is a time consuming and aggressive process, while achieving a summarized version of papers and comparing them, can be both faster and reusable. In addition, we show that the ranked listing that Google scholar returns, can be formulated and predicted based on the citation scores. Furthermore, we show that how statistically, Normal keyword summarization can be a better choice between the two types of summarization of papers. As a future work, we will build a synonym-acronym dictionary for scholarly papers in computer science and engineering field, to add the synonym-acronym comparison to the system.
... Personalised summarisation offers the possibility to present the most salient facets of information to a visitor, according to their interests and preferences. The pragmatic choice of a mobile device such as a smart phone to deliver the content poses challenges in terms of the amount of content that can be effectively presented to the visitor (Yang and Wang, 2003;Otterbacher et al., 2006). ...
Article
People are often overwhelmed by the large amount of information provided in museum spaces, which makes it difficult for them to select exhibits of potential interest. As a first step in recommending exhibits where a visitor may wish to spend some time, this article investigates predictive user models for personalised prediction of museum visitors’ viewing times at exhibits. We consider two content-based models and a nearest-neighbour collaborative filter, and develop a collaborative model based on the theory of spatial processes which relies on a notion of distance between exhibits. We discuss models of exhibit distance derived from viewing-time similarity, semantic similarity and walking distance. The results from our evaluation with a real-world dataset of visitor pathways collected at Melbourne Museum (Melbourne, Australia) suggest that utilising walking and semantic distances between exhibits enables more accurate predictions of a visitor’s viewing times of unseen exhibits than using distances derived from observed exhibit viewing times. Our results also show that all models outperform a non-personalised baseline, that content-based viewing time prediction yields better results than nearest-neighbour collaborative prediction, and that our collaborative model based on spatial processes attains the highest predictive accuracy overall.
... Defining elements of the document is based on the allocation of such text features [17][18]: ...
Preprint
In this article is analyzed technology of automatic text abstracting and annotation. The role of annotation in automatic search and classification for different scientific articles is described. The algorithm of summarization of natural language documents using the concept of importance coefficients is developed. Such concept allows considering the peculiarity of subject areas and topics that could be found in different kinds of documents. Method for generating abstracts of single document based on frequency analysis is developed. The recognition elements for unstructured text analysis are given. The method of pre-processing analysis of several documents is developed. This technique simultaneously considers both statistical approaches to abstracting and the importance of terms in a particular subject domain. The quality of generated abstract is evaluated. For the developed system there was conducted experts evaluation. It was held only for texts in Ukrainian. The developed system concluding essay has higher aggregate score on all criteria. The summarization system architecture is building. To build an information system model there is used CASE-tool AllFusion ERwin Data Modeler. The database scheme for information saving was built. The system is designed to work primarily with Ukrainian texts, which gives a significant advantage, since most modern systems still oriented to English texts
... Early works on incremental summarization (Buyukkokten et al., 2001;Yang and Wang, 2003) leveraged structural tags supported by document markup languages to generate summaries at various lengths, thus imposing a serious constraint on the document formats (e.g. XML, HTML) that come under the purview of such methods. ...
Conference Paper
This paper reports our in situ study using contextual inquiry (CI). It solicits user requirements of hierarchically organized search results for mobile access. In our experiment, search activities of our subjects are recorded in the video, and the interviewer solicits the interface requirements, during and after the experiment. An affinity diagram is built as a summary of our findings in the experiment, and the major issues are discussed in this paper. The search behavior of our subjects is summarized into a flow chart. In this study, we report mobile interface features that are desired by our users in addition to those found in an earlier survey.
Article
Full-text available
Evidence suggests that concertina browsers - browsers with the facility to expand and contract sections of information - are important in providing the reader with an enhanced cognition of small to medium amounts of information. These systems have been shown to be useful for visually disabled users surfing the World Wide Web (Web), and with the development of the Mobile Web, there has been renewed interest in their use. This is due to the similarities of reduced or constrained vision found to exist between visually impaired users and the users of mobile devices. The cognition of information fragments is key to the user experience and the reduction of 'information overload'; as such we are concerned with assisting designers of concertina browsers in providing an enhanced user experience by ascertaining user preference through a formative evaluation of concertina summaries. This aspect of browsing is important because in all concertina systems there is a distinct cognition speed/depth trade-off. Here we investigate a number of these concertina summarization techniques against each other. We describe a formative evaluation which concludes that users prefer concertina summarization of Web documents starting from 6.25% slices of both the top and bottom and expanding from the top in 2% steps to a target maximum of 18.50% (being 12.25% from the top and 6.25% from the bottom). These preferences were found to be representative of documents of less than 600 words of content, and included the preference to not fragment an individual sentence even if that meant slightly increasing the target: Starting, maximum, and step percentage slices.
Conference Paper
Although a lot has been done for visually impaired people to access information with Braille screens, Braille keyboards, Braille PDAs and text-to-speech interfaces, very little has been made to reduce the amount of information they have to deal with. In this paper, we propose an automatic summarization system to ease Web browsing for visually impaired people on PDAs
Conference Paper
Every day we search new information in the web, and we found a lot of documents which contain pages with a great amount of information. There is a big demand for automatic summarization in a rapid and precise way. Many methods have been used in automatic extraction but most of them do not take into account the hierarchical structure of the documents. A novel method using the structure of the document was introduced by Yang and Wang in 2004. It is based in a fractal view method for controlling the information displayed. We explain its drawbacks and we solve them using the new concept of fractal dimension of a text document to achieve a better diversification of the extracted sentences improving the performance of the method.
Article
Content in numerous Web data sources, designed primarily for human consumption, are not directly amenable to machine processing. Automated semantic analysis of such content facilitates their transformation into machine-processable and richly structured semantically annotated data. This paper describes a learning-based technique for semantic analysis of schematic data which are characterized by being template-generated from backend databases. Starting with a seed set of hand-labeled instances of semantic concepts in a set of Web pages, the technique learns statistical models of these concepts using light-weight content features. These models direct the annotation of diverse Web pages possessing similar content semantics. The principles behind the technique find application in information retrieval and extraction problems. Focused Web browsing activities require only selective fragments of particular Web pages but are often performed using bookmarks which fetch the contents of the entire page. This results in information overload for users of constrained interaction modality devices such as small-screen handheld devices. Fine-grained information extraction from Web pages, which are typically performed using page specific and syntactic expressions known as wrappers, suffer from lack of scalability and robustness. We report on the application of our technique in developing semantic bookmarks for retrieving targeted browsing content and semantic wrappers for robust and scalable information extraction from Web pages sharing a semantic domain.
Article
Due to the omnipresence of hand-held mobile devices, people nowadays are running many applications in such devices to fulfill their daily life requirements. However, due to the limited energy (battery power) of mobile hand-held devices, the energy consumption of such applications determines their feasibility of running in such mobile devices for a long term basis. One such important application is the summarization of text information. Although almost all the existing summarization approaches to date are designed to run on high-end servers or cloud platforms aiming to optimize only the summary quality, there are many applications nowadays, e.g., summarizing data in crisis scenarios or summarizing privacy-sensitive data which demands the functionality of on-device computationally light-weight text summarization to generate effective summaries, while keeping in mind the limited battery power of the device. This paper is the first of its kind where we design energy-efficient summarization algorithms for mobile devices. First, we provide a methodology to systematically measure the energy consumption characteristics of various classical extractive summarization techniques at a modular level by analyzing the algorithmic constructs. Through this process, energy-hungry modules are identified under different configurations and environmental parameters, like input data type, dataset size, device type, among others. Next, based on the observations, we develop four classes of energy-efficient hybrid summarization approaches. Extensive experiments show that the hybrid summarization approaches, when applied on various datasets of varying size and type, can save up to 90% energy, with 5–40% degradation in the summary quality with respect to the high-performing base summarization approaches.
Article
Browsing on a mobile device becomes tedious for a user because a very few number of hits can be presented to the user though its limited display area at a time and users have to scroll a lot to see search hits on the screen. Traditionally, the search engines generate the initial search results in the form of text snippets. In this paper, we present a syntactic sentence compression method, which eliminates less important and redundant constituents from text snippets for making room for displaying more number of search results in a given limited display area and facilitating information access to limited display area devices.
Conference Paper
In this paper, we propose an automatic summarization system to ease web browsing for visually impaired people on handheld devices. In particular, we propose a new architecture for summarizing Semantic Textual Units (2) based on efficient algorithms for linguistic treatment (3)(6) which allow real- time processing and deeper linguistic analysis of web pages, thus allowing quality content visualization. Moreover, we present a text-to-speech interface to ease the understanding of web pages content. To our knowledge, this is the first attempt to use both statistical and linguistic techniques for text summarization for browsing on mobile devices.
Article
The technologies for single- and multi-document summarization that are described and evaluated in this article can be used on heterogeneous texts for different summarization tasks. They refer to the extraction of important sentences from the documents, compressing the sentences to their essential or relevant content, and detecting redundant content across sentences. The technologies are tested at the Document Understanding Conference, organized by the National Institute of Standards and Technology, USA in 2002 and 2003. The system obtained good to very good results in this competition. We tested our summarization system also on a variety of English Encyclopedia texts and on Dutch magazine articles. The results show that relying on generic linguistic resources and statistical techniques offer a basis for text summarization.
Article
Das Internet ist die größte Datenbank der Welt — eine riesige Menge an Daten und Informationen, die in einem für Menschen verständlichen Format vorliegen. Leider können diese Daten nicht einfach maschinell von Programmen verwendet werden, dazu ist eine intelligente Interpretation notwendig. Erforderlich dafür sind fortgeschrittene Techniken der Webdatenextraktion und Informationsintegration, die Algorithmen aus dem Bereich der künstlichen Intelligenz verwenden. Die Lixto-Software unterstützt diese Technologien und erlaubt es Entwicklern, aus Webseiten interaktiv Dienste für den Mobilfunkbereich zu erstellen. The World Wide Web is the largest database on earth — a huge amount of data and information primarily intended for human users. Unfortunately, data on the web requires intelligent interpretation and cannot be easily used by programs. It requires advanced data extraction and information integration techniques to automatically process data. Lixto technology addresses these issues and enables developers to interactively turn web pages into mobile services.
Conference Paper
Full-text available
The importance of text summarization grows rapidly as the amount of information increases exponentially. In this paper, we present new method for Persian Text Summarization based on fractal theory. The main goal of this method is using hierarchical structure of document and improves that for Persian language. The result shows that our method improves performance of extractive summarization.
Article
Full-text available
Systems providing personalized services to users have a need to build and maintain a User Model (UM). However, at the onset of providing services, such a system has no prior knowledge about a user and it may benefit from information imported from ex-ternal sources. Due to lack of standards in the representation of UMs, commercial competition, and privacy issues, distinct person-alized service-providing systems build their own specific models and store their information in incompatible manners. Thus, al-though much data on a specific user might exist in other systems; it is typically unavailable for use in the initial phase of the given sys-tem. This work puts forward the design of a user model mediation idea. This is demonstrated in an initial implementation in a spe-cific system (Museum Visitors' guide system) under the PIL pro-ject, where the user is modelled by a "bag of words" vector and the initial information is imported from a case-based modelled user (in an external trip planning system).
Article
M-commerce services are rapidly emerging with recent advances in mobile devices and wireless communications. Internet Shopping mall managers provide mobile phone users with their mobile shopping malls for m-commerce. However, these mobile shopping malls usually have their own sites which are not compatible with their Internet shopping malls, thus resulting in inefficiency to manage the two kinds of shopping malls. In this paper, we develop a M-commerce content provider system to effectively bring Internet shopping malls to mobile phones by extracting items only concerned to users from product documents. We then apply and evaluate the system over a specific internet shopping mall site. The results show that the system reduces considerably the amount of data transferred to user`s mobile phones compared with using general web browsers.
Article
The World Wide Web (WWW) is a huge information network within which searching for relevant quality contents remains an open question. The ambiguity of natural language is traditionally one of the main reasons, which prevents search engines from retrieving information according to users' needs. However, the globalized access to the WWW via Weblogs or social networks has highlighted new problems. Web documents tend to be subjective, they mainly refer to actual events to the detriment of past events and their ever growing number contributes to the well-known problem of information overload. In this thesis, we present our contributions to digest information in real-world heterogeneous text environments (i.e. the Web) thus leveraging users' efforts to encounter relevant quality information. However, most of the works related to Information Digestion deal with the English language fostered by freely available linguistic tools and resources, and as such, cannot be directly replicated for other languages. To overcome this drawback, two directions may be followed: on the one hand, building resources and tools for a given language, or on the other hand, proposing language-independent approaches. Within the context of this report, we will focus on presenting language-independent unsupervised methodologies to (1) extract implicit knowledge about the language and (2) understand the explicit information conveyed by real-world texts, thus allowing to reach Multilingual Information Digestion.
Article
Document summarization is playing an important role in coping with information overload on the Web. Many summarization models have been proposed recently, but few try to adjust the summary length and sentence order according to application scenarios. With the popularity of handheld devices, presenting key information first in summaries of flexible length is of great convenience in terms of faster reading and decision-making and network consumption reduction. Targeting this problem, we introduce a novel task of generating summaries of incremental length. In particular, we require that the summaries should have the ability to automatically adjust the coverage of general-detailed information when the summary length varies. We propose a novel summarization model that incrementally maximizes topic coverage based on the document's hierarchical topic model. In addition to the standard Rouge-1 measure, we define a new evaluation metric based on the similarity of the summaries' topic coverage distribution in order to account for sentence order and summary length. Extensive experiments on Wikipedia pages, DUC 2007, and general noninverted writing style documents from multiple sources show the effectiveness of our proposed approach. Moreover, we carry out a user study on a mobile application scenario to show the usability of the produced summary in terms of improving judgment accuracy and speed, as well as reducing the reading burden and network traffic.
Conference Paper
Full-text available
This paper proposes a Chinese news summarization method for the message services of news brief over cell phones. In this method, important sentences were first identified based on the news content. They were matched against the news title to determine a suitable position for combining with the title to become candidates. These candidates were then ranked by their length and fitness for manual selection. In our evaluation, among 40 news stories, over 62.5% of them have their top-ranked candidates be judged good in quality. If all candidates were considered, over 75% of them yield acceptable summaries without manual editing. Since these summaries were concatenated snippets from the news texts, they can be easily integrated and synchronized with other streamed media such as speech or video from the same story.
ResearchGate has not been able to resolve any references for this publication.