Conference PaperPDF Available

A Generic Architecture for a Social Network Monitoring and Analysis System

Authors:

Abstract

This paper describes the architecture and a partial implementation of a system designed for the monitoring and analysis of communities at social media sites. The main contribution of the paper is a novel system architecture that facilitates long-term monitoring of diverse social networks existing and emerging at various social media sites. It consists of three main modules, the crawler, the repository and the analyzer. The first module can be adapted to crawl different sites based on ontology describing the structure of the site. The repository stores the crawled and analyzed persistent data using efficient data structures. It can be implemented using special purpose graph databases and/or object-relational database. The analyzer hosts modules that can be used for various graph and multimedia contents analysis tasks. The results can be again stored to the repository, and so on. All modules can be run concurrently.
The article
Semenov, A., Veijalainen, J., Boukhanovsky, A., 2011. A Generic Architecture for a Social Network
Monitoring and Analysis System. In Barolli, L., Xhafa, F., Takizawa, M. (Eds.), The 14th International
Conference on Network-Based Information Systems. Los Alamitos, CA, USA: IEEE Computer Society, pp.
178185., DOI: 10.1109/NBiS.2011.52
Is included into his Ph.D. thesis that is available at
https://jyx.jyu.fi/dspace/bitstream/handle/123456789/41559/978-951-39-5225-9.pdf
... prestar apoyo financiero o hacer propaganda-, como eminentemente violentos -p.ej. la planificación de ataques terroristas [3,13,23]. Esto concordaría con declaraciones recientes del Ministro del Interior según las cuales el 80% de las personas captadas por los jihadistas en nuestro país lo serían a través de Internet y las redes sociales, cuando hace tan sólo tres años dicha cifra se refería a centros de culto y prisiones. Asimismo, el periodo de captación y adoctrinamiento es cada vez menor, reduciéndose incluso a dos meses. ...
... Por una parte, la formación terrorista de estos individuos suele ser de tipo autodidacta, empleando Internet para, por ejemplo, acceder a webs sobre cómo fabricar explosivos. Asimismo, suelen dar indicios de sus intenciones (sino anunciarlas directamente) en blogs, foros, etc., información toda ella que podría ser monitorizada empleando herramientas especializadas de NLP y MO con objeto de detectar en las publicaciones marcadores (indicios) que denoten conductas peligrosas o estados emocionales del autor conducentes a un comportamiento antisocial, filtrando y ordenando a los sospechosos de acuerdo a su grado de radicalización o amenaza [5,20,22,23]. ...
Conference Paper
Full-text available
La Minería de Opiniones es la disciplina que aborda el tratamiento automático de las opiniones contenidas en un texto. Permite, por ejemplo, determinar si en un texto se está opinando o no, o si la polaridad o sentimiento que se expresa en el mismo es positiva, negativa o mixta. También permite la extracción automática de características, lo que posibilita conocer la percepción que los autores tienen sobre aspectos concretos de un tema determinado. Este trabajo, tras realizar una introducción a dicho ámbito, presenta una aproximación propia al mismo, la cual destaca por emplear información sintáctica así como por estar especialmente adaptada a uno de los contextos de trabajo más complicados, Twitter. Dicha tecnología es fácilmente aplicable a tareas de inteligencia.
... Semenov et al. [11] propose a data capture approach and architecture for monitoring and analysis of the structure of Social Media communities. Our proposal stems from the basic ideas outlined in this system, widening the approach to the analysis of general contents and considering a broader range of functionality and requirements in several application domains. ...
... In addition, a change in an entity's data schema must not result in data loss. According to studies reported in the literature [11], [17] and lessons learned from our own experiences, a data capture tool, and thus its underlying architecture, must guarantee the following properties: Scalability: ability to handle a growing amount of data; Configurability: ability to customize the system behavior according to configuration parameters (such as refresh time or maximum number of parallel crawlers); High Performance: ability to run different crawler instances (possibly on different machines); High data variance: ability to store and manage different content types: text, image, and video; Resilience: ability to overcome issues such as absence of Internet connection or unavailable response; Adaptability: ability to exploit some features possibly present only on specific Social Media (e.g. Facebook search API). ...
... Relational sociology studies have tended to examine and retrieve information from text data, whereas the importance of the implications of face-to-face interactions when analyzing network information has largely been ignored. A. Semenov et al. [54] proposed three modules for long-term monitoring of different social networks: the crawler, the repository, and the analyzer. By crawling, storing, and analyzing different sites, longitudinal data from social media sites can be examined. ...
Article
Full-text available
The goal of this study was to conduct a literature review of current approaches and techniques for identifying, understanding, and predicting human behaviors through mining a variety of sources of textual data with a focus on enabling classification of psychological behaviors regarding emotion, cognition, and social empathy. This review was performed using keyword searches in ISI Web of Science, Engineering Village Compendex, ProQuest Dissertations, and Google Scholar. Our findings show that, despite recent advancements in predicting human behaviors based on unstructured textual data, significant developments in data analytics systems for identification, determination of interrelationships, and prediction of human cognitive, emotional and social behaviors remain lacking.
... Using semi-public data (data collected with a user account) to retrieve data for Facebook group and page content analysis research with a computational approach such as the one presented in this paper, i.e. a tool using Facebook's own application programming interfaces (API) to gather all available communication activity data from COMPUTATIONAL DATA GATHERING AND ANALYSIS the platform's pages and groups and organizing it into a warehouse, is still very rare in social sciences. Semenov (2013) discusses many aspects of social media data analysis, implementation and repository designed for monitoring communities on social media sites (see also Semenov, Veijalainen, and Boukhanovsky 2011). In their working paper, Zlatanov and Koleva (2014) are using a software application called Opinion Crawler designed to extract data from open Facebook groups and use it for data analysis through people centric models and text network analysis connected to online originated protests. ...
Article
Full-text available
Digital and social media and large available data-sets generate various new possibilities and challenges for conducting research focused on perpetually developing online news ecosystems. This paper presents a novel computational technique for gathering and processing large quantities of data from Facebook. We demonstrate how to use this technique for detecting and analysing issue-attention cycles and news flows in Facebook groups and pages. Although the paper concentrates on a Finnish Facebook group as a case study, the demonstrated method can be used for gathering and analysing large sets of data from various social network sites and national contexts. The paper also discusses Facebook platform regulations concerning data gathering and ethical issues in conducting online research.
... At the first stage, they identified the list of relevant communities using a list of predefined keywords. Based on the created list of 14,777 communities, the dataset of 19,430,445 wall posts and 62,193,711 comments were collected using social media monitoring software presented in Semenov and Veijalainen [105] and Semenov et al. [106]. To classify texts into positive and negative classes, the authors applied a rule-based approach with the vocabulary of 8,863 positive and 24,299 negative words in both Russian and Ukrainian. ...
Article
Full-text available
Sentiment analysis has become a powerful tool in processing and analysing expressed opinions on a large scale. While the application of sentiment analysis on English-language content has been widely examined, the applications on the Russian language remains not as well-studied. In this survey, we comprehensively reviewed the applications of sentiment analysis of Russian-language content and identified current challenges and future research directions. In contrast with previous surveys, we targeted the applications of sentiment analysis rather than existing sentiment analysis approaches and their classification quality. We synthesised and systematically characterised existing applied sentiment analysis studies by their source of analysed data, purpose, employed sentiment analysis approach, and primary outcomes and limitations. We presented a research agenda to improve the quality of the applied sentiment analysis studies and to expand the existing research base to new directions. Additionally, to help scholars selecting an appropriate training dataset, we performed an additional literature review and identified publicly available sentiment datasets of Russian-language texts.
... Finally, a graph database is simulated in [28]. An architectural design is first introduced aimed at storing and querying data extracted from general purpose social networks such as Facebook, LiveJournal and Twitter. ...
Conference Paper
More and more businesses use social media to advertise their services. Such businesses typically maintain online social network accounts and regularly update their pages with advertisement messages describing new products and promotions. One recent trend in such businesses’ activity is to offer incentives to individual users for re-posting the advertisement messages to their own profiles, thus making it visible to more and more users. A common type of an incentive puts all the re-posting users into a random draw for a valuable gift. Understanding the dynamics of user engagement into the re-posting activity can shed light on social influence mechanisms and help determine the optimal incentive value to achieve a large viral cascade of advertisement. We have collected approximately 1800 advertisement messages from social media site VK.com and all the subsequent reposts of those messages, together with all the immediate friends of the reposting users. In addition to that, approximately 150000 non-advertisement messages with their reposts were collected, amounting to approximately 6.5 M of reposts in total. This paper presents the results of the analysis based on these data. We then discuss the problem of maximizing a repost cascade size under a given budget.
Article
Full-text available
Surprisingly cruel mass murders and attacks have been witnessed in the educational institutions of the Western world since the 1970s. These are often referred to as 'school shootings'. There have been over 300 known incidents around the world and the number is growing. Social network sites (SNSs) have enabled the perpetrators to express their views and intentions. Our result is that since about 2005, all major school shooters have had a presence in SNS and some have left traces that would have made possible to evaluate their intentions to carry out a rampage. A further hypothesis is that future school shooters will behave in a similar manner and would thus be traceable in the digital sphere. In this paper, we try to take advantage of this tendency and study the presence of school shooting related information in various SNS and its relation to past and perhaps future cases.
Article
The abstract for this document is available on CSA Illumina.To view the Abstract, click the Abstract button above the document title.
Conference Paper
Understanding the dynamics of network evolution rests in part on the representation chosen to characterize the evolutionary process. We offer a simple, three-parameter representation based on subgraphs that capture three important properties of social networks: leadership, team alignment or bonding among members, and diversity of expertise. When plotted on this representation, the evolution of a typical small group such as start-ups or street gangs has a spiral trajectory, moving toward a tentative fixed point as membership increases to two dozen or so. We show that a simple probabilistic model for recruitment and bonding can not explain these observations, and suggest that strategic moves among group members may come into play. Social networks, small groups, dynamics, evolution, models I
Book
Terrorism informatics has been defined as the application of advanced methodologies, information fusion and analysis techniques to acquire, integrate process, analyze, and manage the diversity of terrorism-related information for international and homeland security-related applications. The wide variety of methods used in terrorism informatics are derived from Computer Science, Informatics, Statistics, Mathematics, Linguistics, Social Sciences, and Public Policy and these methods are involved in the collection of huge amounts of information from varied and multiple sources and of many types in numerous languages. Information fusion and information technology analysis techniques—which include data mining, data integration, language translation technologies, and image and video processing—play central roles in the prevention, detection, and remediation of terrorism. Terrorism Informatics: Knowledge Management and Data Mining for Homeland Security will provide an interdisciplinary and comprehensive survey of the state-of-the-art in the terrorism informatics domain along three basic dimensions: methodological issues in terrorism research; information infusion techniques to support terrorism prevention, detection, and response; and legal, social, privacy, and data confidentiality challenges and approaches. Featuring contributions by leading researchers and practitioners, illustrative case studies, and applications of terrorism informatics techniques, the book will be an essential resource for scientists, security professionals, counterterrorism experts, and policy makers.
Book
Social network analysis applications have experienced tremendous advances within the last few years due in part to increasing trends towards users interacting with each other on the internet. Social networks are organized as graphs, and the data on social networks takes on the form of massive streams, which are mined for a variety of purposes. Social Network Data Analytics covers an important niche in the social network analytics field. This edited volume, contributed by prominent researchers in this field, presents a wide selection of topics on social network data mining such as Structural Properties of Social Networks, Algorithms for Structural Discovery of Social Networks and Content Analysis in Social Networks. This book is also unique in focussing on the data analytical aspects of social networks in the internet scenario, rather than the traditional sociology-driven emphasis prevalent in the existing books, which do not focus on the unique data-intensive characteristics of online social networks. Emphasis is placed on simplifying the content so that students and practitioners benefit from this book. This book targets advanced level students and researchers concentrating on computer science as a secondary text or reference book. Data mining, database, information security, electronic commerce and machine learning professionals will find this book a valuable asset, as well as primary associations such as ACM, IEEE and Management Science.
Article
The rapid development of blogs has brought on some serious problems such as disclosure of sensitive information, spread of unhealthy information, etc. So it is very important for supervisors to detect them. The common methods based on search engines have some drawbacks such as lower efficiency and lower precision because they need to retrieve and update blog pages frequently, and to analyze all blog pages downloaded In this paper, we present a new method for detecting contents of blogs based on communities discovery. It adopts a hierarchical clustering algorithm based on multi-attribute metrics. By this method, we can monitor the whole blog community by detecting the contents of eigenvalue nodes . Our experimental result shows that the algorithm proposed is highly effective.
Article
In this paper we will demonstrate the potential of processing and visualising the dynamics of computer-mediated communities by means of Social Network Analysis. According to the fact that computer-mediated community systems are manifested also as structured data, we use data structures like e-mail, discussion boards, and bibliography sources for an automatic transformation into social network data formats. The paper will demonstrate a 3-dimensional visualisation of two cases: the first presents an author community based on bibliography data converted into GraphML. Based on this dataset we visualise publications networks with a tool called Weaver, which is developed in our research group. According to Lothar Krempel’s algorithm, Weaver uses the first two dimensions to embed the network structure within a common solution space. The third dimension is used for representing the time axis and thus the dynamics of co-authorship relations. The second case describes recent research in open source communities and highlights how our visualization approach can be used as a complement to more traditional approaches, such as content analysis and statistics based on specific SNA indices.
Article
This paper presents a model and system architecture for an early warning system to detect terrorist threats. The paper discusses the shortcomings of state-of-the-art systems and outlines the functional requirements that must to be met by an ideal system working in the counterterrorism domain. The concept of generation of early warnings to predict terrorist threats is presented. The model relies on data collection from open data sources, information retrieval, information extraction for preparing structured workable data sets from available unstructured data, and finally detailed investigation. The conducted investigation includes social network analysis, investigative data mining, and heuristic rules for the study of complex covert networks for terrorist threat indication. The presented model and system architecture can be used as a core framework for an early warning system.