Article

A distributed architecture for large scale news and social media processing

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

When designing a data processing and analytics pipeline for data streams, it is important to provide the data load and be able to successfully balance it over the available resources. This can be achieved more easily if small processing modules, which require limited resources, replace large monolithic processing software. In this work, we present the case of a social media and news analytics platform, called PaloAnalytics, which performs a series of content aggregation, information extraction (e.g., NER, sentiment tagging, etc.) and visualisation tasks in a large amount of data, on a daily basis. We demonstrate the architecture of the platform that relies on micro-modules and message-oriented middleware for delivering distributed content processing. Early results show that the proposed architecture can easily stand the increased content load that occasionally occurs in social media (e.g., when a major event takes place) and quickly release unused resources when the content load reaches its normal flow.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
With the increase use of microblogs and social media platforms as forms of on-line communication, we now have a huge amount of opinionated data reflecting people’s opinions and attitudes in form of reviews, forum discussions, blogs and tweets. This has recently brought great interest to sentiment analysis and opinion mining field that analyzes people’s feelings and attitudes from written language. Most of the existing approaches on sentiment analysis rely mainly on the presence of affect words that explicitly reflect sentiment. However, these approaches are semantically weak, that is, they do not take into account the semantics of words when detecting their sentiment in text. Only recently a few approaches (e.g. sentic computing) started investigating towards this direction. Following this trend, this paper investigates the role of semantics in sentiment analysis of social media. To this end, frame semantics and lexical resources such as BabelNet are employed to extract semantic features from social media that lead to more accurate sentiment analysis models. Experiments are conducted with different types of semantic information by assessing their impact in four social media datasets which incorporate tweets, blogs and movie reviews. A tenfold cross-validation shows that F1 measure increases significantly when using semantics in sentiment analysis in social media. Results show that the proposed approach considering word’s semantics for sentiment analysis surpasses non-semantic approaches for the considered datasets.