ArticlePDF Available

Publish Subscribe Matching Users' Interests for Quality Web Content

Authors:

Abstract and Figures

A Publish subscribe system that its database is being updated through a search engine that classifies topics of the web articles and gives it a rating for its quality. User subscribed to the publish subscribe system can get feeds of subjects that matches their interests of quality content, the content that has been marked as high quality. In this paper, I will discuss a formula that can be used for evaluating the quality of the text and tag that text in the search engine database with its quality rating to be sent to interested users.
Content may be subject to copyright.
A preview of the PDF is not available
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Text Categorization is the task of automatically sorting a set of documents into categories from a predefined set and Text Summarization is a brief and accurate representation of input text such that the output covers the most important concepts of the source in a condensed manner. Document Summarization is an emerging technique for understanding the main purpose of any kind of documents. This paper presents a model that uses text categorization and text summarization for searching a document based on user query.
Conference Paper
Full-text available
We follow an empirical approach from data quality toward text quality, where the expectations of the consumer, human or machine, take the centre stage. We try to obtain numerical text quality statements which must be interpreted for the ex- pectations of the user and suitability for automatic natural language processing (NLP) separately. We state that apart from text accessibility today only representational text quality metrics can be derived and computed automatically. Interestingly, text quality for NLP traces back to questions of text representation.
Conference Paper
Full-text available
Wikipedia, "the free encyclopedia", now contains over two million English articles, and is widely regarded as a high-quality, authoritative encyclopedia. Some Wikipedia arti-cles, however, are of questionable quality, and it is not al-ways apparent to the visitor which articles are good and which are bad. We propose a simple metric – word count – for measuring article quality. In spite of its striking simplic-ity, we show that this metric significantly outperforms the more complex methods described in related work.
Conference Paper
Full-text available
A current application of automatic text summarization is to provide an overview of relevant documents coming from an information retrieval (IR) system. This paper examines how Centrifuser, one such summarization system, was designed with respect to methods used in the library community. We have reviewed these librarian expert techniques to assist information seekers and codified them into eight distinct strategies. We detail how we have operationalized six of these strategies in Centrifuser by computing an informative extract, indicative differences between documents, as well as navigational links to narrow or broaden a user's query. We conclude the paper with results from a preliminary evaluation.
Article
Full-text available
Well adapted to the loosely coupled nature of distributed interaction in large-scale applications, the publish/subscribe communication paradigm has recently received increasing attention. With systems based on the publish/subscribe interaction scheme, subscribers register their interest in an event, or a pattern of events, and are subsequently asynchronously notified of events generated by publishers. Many variants of the paradigm have recently been proposed, each variant being specifically adapted to some given application or network model. This paper factors out the common denominator underlying these variants: full decoupling of the communicating entities in time, space, and synchronization. We use these three decoupling dimensions to better identify commonalities and divergences with traditional interaction paradigms. The many variations on the theme of publish/subscribe are classified and synthesized. In particular, their respective benefits and shortcomings are discussed both in terms of interfaces and implementations.
Article
Full-text available
In this article we present an application of text-analysis technologies to support social science research, in particular the analysis of patterns in news content. We describe a system that gathers and annotates large volumes of textual data in order to extract patterns and trends. We have examined 3.5 million news articles and show that their topic is related to the gender bias and readability of their content. This study is intended to illustrate how pattern analysis technology can be deployed to automate tasks commonly performed by humans in the social sciences, in order to enable large scale studies that would otherwise be impossible.
Article
This study examines the validity of newspaper indexes, lead paragraphs, and headlines as representations of full-text media content. We analyze the effects of production decisions on content and categorization in the New York Times Index, based on interviews with its senior editor. We then compare the content of three proxies with that of full-text articles by conducting a parallel content analysis of New York Times stories covering the 1986 Libya crisis and their corresponding Index entries. The study suggests that proxy data can be used to roughly estimate the broad contours of Times coverage but do not reliably represent several key aspects of New York Times reporting.
Conference Paper
This paper investigates the automatic evaluation of text coherence for machine-generated texts. We in- troduce a fully-automatic, linguistically rich model of local coherence that correlates with human judg- ments. Our modeling approach relies on shallow text properties and is relatively inexpensive. We present experimental results that assess the pre- dictive power of various discourse representations proposed in the linguistic literature. Our results demonstrate that certain models capture comple- mentary aspects of coherence and thus can be com- bined to improve performance.
Article
This paper presents a generalization of the Naive Bayes Classifier. The method is specifically designed for binary classification problems commonly found in credit scoring and marketing applications. The Generalized Naive Bayes Classifier turns out to be a powerful tool for both exploratory and predictive analysis. It can generate accurate predictions through a flexible, non-parametric fitting procedure, while being able to uncover hidden patterns in the data. In this paper, the Generalized Naive Bayes Classifier and the original Bayes Classifier will be demonstrated. Also, important ties to logistic regression, the Generalized Additive Model (GAM), and Weight Of Evidence will be discussed.