Conference Paper

Cooperative, Dynamic Twitter Parsing and Visualization for Dark Network Analysis

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Developing a network based on Twitter data for social network analysis (SNA) is a common task in most academic domains. The need for real-time analysis is not as prevalent due to the fact that researchers are interested in the analysis of Twitter information after a major event or for an overall statistical or sociological study of general Twitter users. Dark network analysis is a specific field that focuses on criminal, terroristic, or people of interest networks in which evaluating information quickly and making decisions from this information is crucial. We propose a plaiform and visualization called Dynamic Twitter Network Analysis (DTNA) that incorporates real-time information from Twitter, its subsequent network topology, geographical placement of geotagged tweets on a Google Map, and storage for long-term analysis. The plaiform provides a SNA visualization that allows the user to interpret and change the search criteria quickly based on visual aesthetic properties built from key dark network utilities with a user interface that can be dynamic, up-to-date for time critical decisions and geographic specific.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... g instead useful in guiding further investigation. The analysis of the Enron dataset (which we discuss further in Section 4.1.3) presented does show some utility, but it is worth noting that the analyst's interpretation of results seems likely to be informed by previous knowledge of the dataset's context. A blinded study would mitigate such issues. [Dudas 2013] makes use of Twitter data and geolocation for building a social network based on ongoing terrorist events, and then provides a modifiable visualisation to aid interpretation. Several areas for ongoing development are highlighted, including incorporation of temporal and sentiment dimensions into the visualisation tool. [Barbian 2011] the ...
... tion environment (searching for red balloons), and how experiences in the challenge may relate generally to intelligence-gathering, particularly with regard to false-reporting. Their overview is high-level and rather specific to their challenge, but includes reference to a number of techniques and technologies not otherwise captured by this review. [Dudas 2013] focuses on the detection and analysis of 'dark networks', with specific focus on visualisation tools for handling networks parsed from Twitter and placed by geolocation. No formal evaluation is provided, but the paper discusses trial usage on real networks of interest. [Johnson et al. 2012] look at finding relationships between unstruct ...
Article
Full-text available
As an increasing amount of crime takes on a digital aspect, law enforcement bodies must tackle an online environment generating huge volumes of data. With manual inspections becoming increasingly infeasible, law enforcement bodies are optimising online investigations through data-mining technologies. Such technologies must be well designed and rigorously grounded, yet no survey of the online data-mining literature exists which examines their techniques, applications and rigour. This article remedies this gap through a systematic mapping study describing online data-mining literature which visibly targets law enforcement applications, using evidence-based practices in survey making to produce a replicable analysis which can be methodologically examined for deficiencies.
... The main component of the interface is the temporal network area (part b in Figure 1) in which a snapshot Gt of the multivariate dynamic network is depicted to achieve overview and zoom/filter functions. Although there exists other network layouts, such as matrix notation [GGK * 11] or 3D networks [GZD * 15], node-link layout [Dud13,MH15] is chosen due to its simplicity and understandability. A temporal snapshot of the network allows the users to focus on the dynamics of the current time step. ...
Conference Paper
Full-text available
Visualizing multivariate dynamic networks is a challenging task. The evolution of the dynamic network within the temporal axis must be depicted in conjunction with the associated multivariate attributes. In this paper, an exploratory visual analytics tool is proposed to display multivariate dynamic networks with spatial attributes. The proposed tool displays the distribution of multivariate temporal domain and network attributes in scattered views. Moreover, in order to expose the evolution of a single or a group of nodes in the dynamic network along the temporal axis, an egocentric approach is applied in which a node is represented with its neighborhood as an ego-network. This approach allows users to observe a node’s surrounding environment along the temporal axis. On top of the traditional ego-network visualization methods, such as timelines, the proposed tool encodes ego-networks as feature vectors consisting of the domain and network attributes and projects them onto 2D views. As a result, the distance between projected ego-networks represents the dissimilarity across the temporal axis in a single view. The proposed tool is demonstrated with a real-world use case scenario on merchant networks obtained from a one-year-long credit card transactions.
... Besides the idea of achieving a comparable accuracy, a new perspective should give weight to such a public mood detection study. Visualizing the geo-localized mood is an appealing idea that a minority of the related works have applied in various ways [8,[13][14][15]. This study is a discriminative one as visually projecting a country-wide public mood represented by realtime data provided from Twitter. ...
Conference Paper
Beyond the connectivity feature between the individuals, social media offers more facilities to the researchers since they provide meaningful data about the opinion and tendency of members about subjects or products. As a micro blogger site, Twitter provides a compressed expression of one’s opinions, which could only be queried via some public surveys recently. To extract the sentimental distribution of Turkey, we compared the outputs of some classification algorithms. Among these methods, we used SMO algorithm that seemed to present an optimal point between accuracy and computational weight as the classifier. We used Google Maps API to visualize the output data on a publicly available web page, enabling visitors to observe the real-time sentimental distribution of Turkey.
Article
Timely identification of terrorist networks within civilian populations could assist security and intelligence personnel to disrupt and dismantle potential terrorist activities. Finding “small” and “good” communities in multi-layer terrorist networks, where each layer represents a particular type of relationship between network actors, is a vital step in such disruption efforts. We propose a community detection algorithm that draws on the principles of discrete-time random walks to find such “small” and “good” communities in a multi-layer terrorist network. Our algorithm uses several parallel walkers that take short independent random walks towards hubs on a multi-layer network to capture its structure. We first evaluate the correlation between nodes using the extracted walks. Then, we apply an agglomerative clustering procedure to maximize the asymptotical Surprise, which allows us to go beyond the resolution limit and find small and less sparse communities in multi-layer networks. This process affords us a focused investigation on the more important seeds over random actors within the network. We tested our algorithm on three real-world multi-layer dark networks and compared the results against those found by applying two existing approaches – Louvain and InfoMap – to the same networks. The comparative analysis shows that our algorithm outperforms the existing approaches in differentiating “small” and “good” communities.
Conference Paper
We¹ often want to track back the conversation logs, such as meeting or group chats which users were late to participate, and timelines of social network services. It often happens to be difficult to find important remarks if the logs are very long. Micro-blogs (e.g. Twitter) is a typical example. We often miss to find important remarks if we have large number of followers. Also, it often happens that many topics are mixed in a single timeline; this situation may also make us difficult to find important remarks. This paper presents a visualization tool which briefly displays the flow of conversations. This tool provides animations of flow of topics and speakers/writers applying a force-directed visualization technique.
Article
With the development of big data technology, information visualization plays an increasingly important role in the social network, especially the microblog network. The path of the tweet diffusion shows the relationships among microblog users and constructs a directed network among them. The microblog data mining project and the visualization process of the tweet propagation are described in detail. The visualization results of the tweet propagation indicate that there are two typical transmission path patterns: the dandelion pattern and the double-star pattern. The visualization research may easily discover the key retweeting nodes in the tweet diffusion network.
Article
Full-text available
Gephi is an open source software for graph and network analysis. It uses a 3D render engine to display large networks in real-time and to speed up the exploration. A flexible and multi-task architecture brings new possibilities to work with complex data sets and produce valuable visual results. We present several key features of Gephi in the context of interactive exploration and interpretation of networks. It provides easy and broad access to network data and allows for spatializing, filtering, navigating, manipulating and clustering. Finally, by presenting dynamic features of Gephi, we highlight key aspects of dynamic network visualization.
Article
Full-text available
A large number of visualization tools have been created to help decision makers understand increasingly rich databases of product, customer, sales force, and other types of marketing information. This article presents a framework for thinking about how visual representations are likely to affect the decision processes or tasks that marketing managers and consumers commonly face, particularly those that involve the analysis or synthesis of substantial amounts of data. From this framework, the authors derive a set of testable propositions that serve as an agenda for further research. Although visual representations are likely to improve marketing manager efficiency, offer new insights, and increase customer satisfaction and loyalty, they may also bias decisions by focusing attention on a limited set of alternatives, increasing the salience and evaluability of less diagnostic information, and encouraging inaccurate comparisons. Given this, marketing managers are advised to subject insights from visual representations to more formal analysis.
Article
Full-text available
The principle of homophily says that people associate with other groups of people who are mostly like themselves. Many online communities are structured around groups of socially similar individuals. On Twitter, however, people are exposed to multiple, diverse points of view through the public timeline. The authors captured 30,000 tweets about the shooting of George Tiller, a late-term abortion doctor, and the subsequent conversations among pro-life and pro-choice advocates. They found that replies between like-minded individuals strengthen group identity, whereas replies between different-minded individuals reinforce in-group and out-group affiliation. Their results show that people are exposed to broader viewpoints than they were before but are limited in their ability to engage in meaningful discussion. They conclude with implications for different kinds of social participation on Twitter more generally.
Article
Full-text available
This article details the networked production and dissemination of news on Twitter during snapshots of the 2011 Tunisian and Egyptian Revolutions as seen through information flows—sets of near-duplicate tweets—across activists, bloggers, journalists, mainstream media outlets, and other engaged participants. We differentiate between these user types and analyze patterns of sourcing and routing information among them. We describe the symbiotic relationship between media outlets and individuals and the distinct roles particular user types appear to play. Using this analysis, we discuss how Twitter plays a key role in amplifying and spreading timely information across the globe.
Article
Full-text available
Our goal in this paper is to explore two generic approaches to disrupting dark networks: kinetic and non-kinetic. The kinetic approach involves aggressive and offensive measures to eliminate or capture network members and their supporters, while the non-kinetic approach involves the use of subtle, non-coercive means for combating dark networks. Two strategies derive from the kinetic approach: Targeting and Capacity-building. Four strategies derive from the non-kinetic approach: Institution-Building, Psychological Operations, Information Operations and Rehabilitation. We use network data from Noordin Top's South East Asian terror network to illustrate how both kinetic and non-kinetic strategies could be pursued depending on a commander's intent. Using this strategic framework as a backdrop, we strongly advise the use of SNA metrics in developing alterative counter-terrorism strategies that are context-dependent rather than letting SNA metrics define and drive a particular strategy.
Conference Paper
Full-text available
Although both statistical methods and visualizations have been used by network analysts, exploratory data analysis remains a challenge. We propose that a tight integration of these technologies in an interactive exploratory tool could dramatically speed insight development. To test the power of this integrated approach, we created a novel social network analysis tool, SocialAction, and conducted four long-term case studies with domain experts, each working on unique data sets with unique problems. The structured replicated case studies show that the integrated approach in SocialAction led to significant discoveries by a political analyst, a bibliometrician, a healthcare consultant, and a counter-terrorism researcher. Our contributions demonstrate that the tight integration of statistics and visualizations improves exploratory data analysis, and that our evaluation methodology for long-term case studies captures the research strategies of data analysts.
Conference Paper
Full-text available
Web-based social media networks have an increasing frequency of health-related information, resources, and networks (both support and professional). Although we are aware of the presence of these health networks, we do not yet know their ability to (1) influence the flow of health-related behaviors, attitudes, and information and (2) what resources have the most influence in shaping particular health outcomes. Lastly, the health research community lacks easy-to-use data gathering tools to conduct applied research using data from social media websites. In this position paper we discuss and sketch our current work on addressing fundamental questions about information flow in cancer-related social media networks by visualizing and understanding authority, trust, and cohesion. We discuss the development of methods to visualize these networks and information flow on them using real-time data from the social media website Twitter and how these networks influence health outcomes by examining responses to specific health messages.
Conference Paper
Full-text available
Directed links in social media could represent anything from intimate friendships to common interests, or even a passion for breaking news or celebrity gossip. Such directed links determine the flow of information and hence indicate a user's influence on others—a concept that is crucial in sociology and viral marketing. In this paper, using a large amount of data collected from Twit- ter, we present an in-depth comparison of three mea- sures of influence: indegree, retweets, and mentions. Based on these measures, we investigate the dynam- ics of user influence across topics and time. We make several interesting observations. First, popular users who have high indegree are not necessarily influential in terms of spawning retweets or mentions. Second, most influential users can hold significant influence over a variety of topics. Third, influence is not gained spon- taneously or accidentally, but through concerted effort such as limiting tweets to a single topic. We believe that these findings provide new insights for viral marketing and suggest that topological measures such as indegree alone reveals very little about the influence of a user.
Conference Paper
Full-text available
Gephi is an open source software for graph and network analysis. It uses a 3D render engine to display large networks in real-time and to speed up the exploration. A flexible and multi-task architecture brings new possibilities to work with complex data sets and produce valuable visual results. We present several key features of Gephi in the context of interactive exploration and interpretation of networks. It provides easy and broad access to network data and allows for spatializing, filtering, navigating, manipulating and clustering. Finally, by presenting dynamic features of Gephi, we highlight key aspects of dynamic network visualization.
Article
Full-text available
More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been rewritten entirely from scratch, evolved substantially and now accompanies a text on data mining [35]. These days, WEKA enjoys widespread acceptance in both academia and business, has an active community, and has been downloaded more than 1.4 million times since being placed on Source-Forge in April 2000. This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Article
Full-text available
Sentiment analysis of microblogs such as Twitter has recently gained a fair amount of attention. One of the simplest sentiment analysis approaches compares the words of a posting against a labeled word list, where each word has been scored for valence, -- a 'sentiment lexicon' or 'affective word lists'. There exist several affective word lists, e.g., ANEW (Affective Norms for English Words) developed before the advent of microblogging and sentiment analysis. I wanted to examine how well ANEW and other word lists performs for the detection of sentiment strength in microblog posts in comparison with a new word list specifically constructed for microblogs. I used manually labeled postings from Twitter scored for sentiment. Using a simple word matching I show that the new word list may perform better than ANEW, though not as good as the more elaborate approach found in SentiStrength.
Article
Full-text available
<?Pub Caret> Networks and networking are viewed as ways of dealing with complex problems that beset both the state and society. Homelessness, health care, and crime are all viewed as problems that networks can manage better than single organizations can. This article views these problems as networks that must be confronted if Western democracies wish to deal with terrorism, drug smuggling, and the manifold pathologies that confront failed states. In this article we adopt the perspective of networks as problems. The majority of the literature on networks and collaboration is quite positive. Collaborative networks are seen as appropriate devices to tackle public management problems and successfully coordinate political, social, and economic action. From the level of global governance, European integration, sectoral policy networks at the national level, and service implementation networks at the local level, these devices are all viewed as ways of solving governance problems in a complex and differentiated world. The research proposed here intends to develop a more holistic view of this phenomenon by looking at dark networks. The article tries to evaluate how network structures and governance are used for criminal or immoral ends. Because the judgment of ends is inherently normative, we propose to talk about overt and legal versus covert and illegal networks. We then analyze where the similarities and differences between the two sets are and what we might be able to learn regarding both forms if we mirror them against each other. The article develops a set of propositions drawn from selected cases of drug&hyphen;trafficking networks, the diamond and weapons trade, and the Al Qaeda terrorist network.
Article
Full-text available
We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection methods in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2 million customers and by analysing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad hoc modular networks.
Article
Twitter is now well established as the world's second most important social media platform, after Facebook. Its 140-character updates are designed for brief messaging, and its network structures are kept relatively flat and simple: messages from users are either public and visible to all (even to unregistered visitors using the Twitter website), or private and visible only to approved ‘followers’ of the sender; there are no more complex definitions of degrees of connection (family, friends, friends of friends) as they are available in other social networks. Over time, Twitter users have developed simple, but effective mechanisms for working around these limitations: ‘#hashtags’, which enable the manual or automatic collation of all tweets containing the same #hashtag, as well allowing users to subscribe to content feeds that contain only those tweets which feature specific #hashtags; and ‘@replies’, which allow senders to direct public messages even to users whom they do not already follow. This paper documents a methodology for extracting public Twitter activity data around specific #hashtags, and for processing these data in order to analyse and visualize the @reply networks existing between participating users – both overall, as a static network, and over time, to highlight the dynamic structure of @reply conversations. Such visualizations enable us to highlight the shifting roles played by individual participants, as well as the response of the overall #hashtag community to new stimuli – such as the entry of new participants or the availability of new information. Over longer timeframes, it is also possible to identify different phases in the overall discussion, or the formation of distinct clusters of preferentially interacting participants.
Article
2In an influential paper, Freeman (1979) identified three aspects of centrality: betweenness, nearness, and degree. Perhaps because they are designed to apply to networks in which relations are binary valued (they exist or they do not), these types of centrality have not been used in interlocking directorate research, which has almost exclusively used formula (2) below to compute centrality. Conceptually, this measure, of which c(ot, 3) is a generalization, is closest to being a nearness measure when 3 is positive. In any case, there is no discrepancy between the measures for the four networks whose analysis forms the heart of this paper. The rank orderings by the
Article
The study of terrorism informatics utilizing the Twitter microblogging service has not been given apt attention in the past few years. Twitter has been identified as both a potential facilitator and also a powerful deterrent to terrorism. Based on observations of Twitter’s role in civilian response during the recent 2009 Jakarta and Mumbai terrorist attacks, we propose a structured framework to harvest civilian sentiment and response on Twitter during terrorism scenarios. Coupled with intelligent data mining, visualization, and filtering methods, this data can be collated into a knowledge base that would be of great utility to decision-makers and the authorities for rapid response and monitoring during such scenarios. Using synthetic experimental data, we demonstrated that the proposed framework has yielded meaningful graphical visualizations of information, to reveal potential response to terrorist threats. The novelty of this study is that microblogging has never been studied in the domain of terrorism informatics. This paper also contributes to the understanding of the capability of conjoint structured data and unstructured content mining in extracting deep knowledge from noisy twitter messages, through our proposed structured framework.
Book
This monograph gives a tutorial treatment of new approaches to self-organization, adaptation, learning and memory. It is based on recent research results, both mathematical and computer simulations, and lends itself to graduate and postgraduate courses in the natural sciences. The book presents new formalisms of pattern processing: orthogonal projectors, optimal associative mappings, novelty filters, subspace methods, feature-sensitive units, and self-organization of topological maps, with all their computable algorithms. The main objective is to provide an understanding of the properties of information representations from a general point of view and of their use in pattern information processing, as well as an understanding of many functions of the brain. In the second edition two new chapters on neural computing and optical associative memories have been added.
Conference Paper
Twitter is a microblogging website where users read and write millions of short messages on a variety of topics every day. This study uses the context of the German federal election to investigate whether Twitter is used as a forum for political deliberation and whether online messages on Twitter validly mirror offline political sentiment. Using LIWC text analysis software, we conducted a contentanalysis of over 100,000 messages containing a reference to either a political party or a politician. Our results show that Twitter is indeed used extensively for political deliberation. We find that the mere number of messages mentioning a party reflects the election result. Moreover, joint mentions of two parties are in line with real world political ties and coalitions. An analysis of the tweets' political sentiment demonstrates close correspondence to the parties' and politicians' political positions indicating that the content of Twitter messages plausibly reflects the offline political landscape. We discuss the use of microblogging message content as a valid indicator of political sentiment and derive suggestions for further research. Copyright © 2010, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Conference Paper
Twitter, a microblogging service less than three years old, com- mands more than 41 million users as of July 2009 and is growing fast. Twitter users tweet about any topic within the 140-character limit and follow others to receive their tweets. The goal of this paper is to study the topological characteristics of Twitter and its power as a new medium of information sharing. We have crawled the entire Twitter site and obtained 41:7 million user profiles, 1:47 billion social relations, 4; 262 trending topics, and 106 million tweets. In its follower-following topology analysis we have found a non-power-law follower distribution, a short effec- tive diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks (28). In order to identify influentials on Twitter, we have ranked users by the number of followers and by PageRank and found two rankings to be sim- ilar. Ranking by retweets differs from the previous two rankings, indicating a gap in influence inferred from the number of followers and that from the popularity of one's tweets. We have analyzed the tweets of top trending topics and reported on their temporal behav- ior and user participation. We have classified the trending topics based on the active period and the tweets and show that the ma- jority (over 85%) of topics are headline news or persistent news in nature. A closer look at retweets reveals that any retweeted tweet is to reach an average of 1; 000 users no matter what the number of followers is of the original tweet. Once retweeted, a tweet gets retweeted almost instantly on next hops, signifying fast diffusion of information after the 1st retweet. To the best of our knowledge this work is the first quantitative study on the entire Twittersphere and information diffusion on it.
Article
Data-Driven Documents (D3) is a novel representation-transparent approach to visualization for the web. Rather than hide the underlying scenegraph within a toolkit-specific abstraction, D3 enables direct inspection and manipulation of a native representation: the standard document object model (DOM). With D3, designers selectively bind input data to arbitrary document elements, applying dynamic transforms to both generate and modify content. We show how representational transparency improves expressiveness and better integrates with developer tools than prior approaches, while offering comparable notational efficiency and retaining powerful declarative components. Immediate evaluation of operators further simplifies debugging and allows iterative development. Additionally, we demonstrate how D3 transforms naturally enable animation and interaction with dramatic performance improvements over intermediate representations.
Book
Este libro ofrece un nuevo enfoque a la auto-organización, adapta -ción, aprendizaje y memoria, que da lugar a cursos para postgraduados en ciencias de la información, ciencias computacionales, psicología, biología teórica, y física.
Google Maps API Available: https://developers.google.com/maps
  • Google
Google. (2005, Google Maps API. Available: https://developers.google.com/maps/
Network Topology, Key Players, and Terrorist Network
  • Sean F Everton
Everton, Sean F. (2012). Network Topology, Key Players, and Terrorist Network. Connections, 32(1), 12-19.
D³ Data-Driven Documents. Visualization and Computer Graphics
  • M Bostock
  • V Ogievetsky
  • J Heer
Bostock, M., Ogievetsky, V., & Heer, J. (2011). D³ Data-Driven Documents. Visualization and Computer Graphics, IEEE Transactions on, 17(12), 2301-2309.
Evolve: Analyzing Evolving Social Networks: DTIC Document
  • S Macskassy
Macskassy, S. (2012). Evolve: Analyzing Evolving Social Networks: DTIC Document.
Fast unfolding of communities in large networks Power and centrality: A family of measures
  • V D Blondel
  • J L Guillaume
  • R Lambiotte
  • E Lefebvre
Blondel, V.D., Guillaume, J.L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008. Bonacich, P. (1987). Power and centrality: A family of measures. American journal of sociology, 1170- 1182.
Google Maps API. from https://developers.google.com The WEKA data mining software: an update
  • M Google
  • E Frank
  • G Holmes
  • B Pfahringer
  • P Reutemann
  • I H Witten
Google. (2005). Google Maps API. from https://developers.google.com/maps/ Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I.H. (2009). The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter, 11(1), 10-18.