Conference PaperPDF Available

Lariat: A Visual Analytics Tool for Social Media Researchers to Explore Twitter Datasets

Authors:
A preview of the PDF is not available
... Meanwhile, the field of natural language processing (NLP) has as its mission the categorization and analysis of textual data. Indeed, there has been substantial research in the social sciences and the human computer interaction (HCI) community suggesting that NLP and machine learning (ML) techniques have the potential to assist certain types of analyses of qualitative data [1,10,11,13,17,18,27,36,39,45]. However, scholars from both fields note the paucity of research aimed at understanding the practices and needs of qualitative researchers, and as a result, existing tools for qualitative coding do not necessarily meet user needs. ...
... While not evaluating the performance of the two topic modeling techniques, Bakharia et al. report that participants were able to exploit interactivity and improve auto-generated topics. More recently, Chen et al. [10] and Paredes et al. [36] built interactive tools for exploring large-scale social datasets, finding that qualitative researchers could use the tools to discover trends and other analytic insights in the data. Chen et al. [11] also call for approaches that are interactive, easy to interpret and visualize, and importantly, based on an understanding of the practices and needs of qualitative researchers. ...
... While prior work has focused on data exploration [10,17,18,27,36,39] and evidence gathering [1,45], automation can help qualitative researchers in a variety of other ways. We can exploit the structure in interviews to suggest and annotate demographic and protocol-based codes. ...
Conference Paper
Qualitative researchers perform an important and painstaking data annotation process known as coding. However, much of the process can be tedious and repetitive, becoming prohibitive for large datasets. Could coding be partially automated, and should it be? To answer this question, we interviewed researchers and observed them code interview transcripts. We found that across disciplines, researchers follow several coding practices well-suited to automation. Further, researchers desire automation after having developed a codebook and coded a subset of data, particularly in extending their coding to unseen data. Researchers also require any assistive tool to be transparent about its recommendations. Based on our findings, we built prototypes to partially automate coding using simple natural language processing techniques. Our top-performing system generates coding that matches human coders on inter-rater reliability measures. We discuss implications for interface and algorithm design, meta-issues around automating qualitative research, and suggestions for future work.
Chapter
With the advancement of technologies and the spread of the internet, online social networks have become increasingly popular. Associated with this growth, the volume of digital data made available by these media has significantly increased. In this context, there is an interest on the part of researchers to increasingly use online social networks as a source of data to obtain knowledge and develop important research in all scientific areas. Thus, this work aimed to investigate, through an exploratory and qualitative study, the difficulties and needs of researchers related to the data collection and the other steps of the text mining process in online social networks. The Underlying Discourse Unveiling Method was used to collect and analyze the data. The results show that users are dissatisfied with the existing tools that support them in this process. In addition, the results also highlight the needs of data researchers in order to create a tool that provides them a better user experience during text mining on online social networks (OSN), making this process more effective.
Book
Full-text available
This book is about how different communicators - whether crisis managers, first responders, journalists, or private citizens and disaster victims - have used social media to communicate about risks and crises. It is also about how these very different actors can play a crucial role in mitigating or preventing crises. How can they use social media to strengthen their own and the public?s awareness and understanding of crises when they unfold? How can they use social media to promote resilience during crises and the ability to deal with the after-effects? Chapters address such questions by presenting new research-based knowledge on social media use during different crises: the terrorist attacks in Norway on 22 July 2011; the central European floods in Austria in 2013; and the West African Ebola-outbreak in 2014. The collection also presents research on the development of a tool for gathering social media information, based on a user-centered design.
Article
Full-text available
Visualization of sentiments and opinions extracted from or annotated in texts has become a prominent topic of research over the last decade. From basic pie and bar charts used to illustrate customer reviews to extensive visual analytics systems involving novel representations, sentiment visualization techniques have evolved to deal with complex multidimensional data sets, including temporal, relational and geospatial aspects. This contribution presents a survey of sentiment visualization techniques based on a detailed categorization. We describe the background of sentiment analysis, introduce a categorization for sentiment visualization techniques that includes 7 groups with 35 categories in total, and discuss 132 techniques from peer-reviewed publications together with an interactive web-based survey browser. Finally, we discuss insights and opportunities for further research in sentiment visualization. We expect this survey to be useful for visualization researchers whose interests include sentiment or other aspects of text data as well as researchers and practitioners from other disciplines in search of efficient visualization techniques applicable to their tasks and data.
Conference Paper
Full-text available
Data about migration flows are largely inconsistent across countries, typically outdated, and often inexistent. Despite the importance of migration as a driver of demographic change, there is limited availability of migration statistics. Generally, researchers rely on census data to indirectly estimate flows. However, little can be inferred for specific years between censuses and for recent trends. The increasing availability of geolocated data from online sources has opened up new opportunities to track recent trends in migration patterns and to improve our understanding of the relationships between internal and international migration. In this paper, we use geolocated data for about 500,000 users of the social network website "Twitter". The data are for users in OECD countries during the period May 2011- April 2013. We evaluated, for the subsample of users who have posted geolocated tweets regularly, the geographic movements within and between countries for independent periods of four months, respectively. Since Twitter users are not representative of the OECD population, we cannot infer migration rates at a single point in time. However, we proposed a difference-in-differences approach to reduce selection bias when we infer trends in out-migration rates for single countries. Our results indicate that our approach is relevant to address two longstanding questions in the migration literature. First, our methods can be used to predict turning points in migration trends, which are particularly relevant for migration forecasting. Second, geolocated Twitter data can substantially improve our understanding of the relationships between internal and international migration. Our analysis relies uniquely on publicly available data that could be potentially available in real time and that could be used to monitor migration trends. The Web Science community is well-positioned to address, in future work, a number of methodological and substantive questions that we discuss in this article.
Conference Paper
This study offers an in-depth analysis of four rumors that spread through Twitter after the 2013 Boston Marathon Bombings. Through qualitative and visual analysis, we describe each rumor's origins, changes over time, and relationships between different types of rumoring behavior. We identify several quantitative measures-including temporal progression, domain diversity, lexical diversity and geolocation features-that constitute a multi-dimensional signature for each rumor, and provide evidence supporting the existence of different rumor types. Ultimately these signatures enhance our understanding of how different kinds of rumors propagate online during crisis events. In constructing these signatures, this research demonstrates and documents an emerging method for deeply and recursively integrating qualitative and quantitative methods for analysis of social media trace data.
Conference Paper
Researchers across many fields are increasingly using data from social media sites to address questions about individual and group social behaviors. However, the size and complexity of these data sets challenge traditional research methods; many new tools and techniques have been developed to support research in this area. In this paper, we present our experience designing and evaluating Agave, a collaborative visual analysis system for exploring events and sentiment over time in large tweet data sets. We offer findings from evaluating Agave with researchers experienced with social media data, focusing on how users interpreted sentiment labels shown in the interface and on the value of collaboration for stimulating exploratory analysis.
Article
While intelligence analysis has been a primary target domain for visual analytics system development, relatively little user and task analysis has been conducted within this area. Our research community's understanding of the work processes and practices of intelligence analysts is not deep enough to adequately address their needs. Without a better understanding of the analysts and their problems, we cannot build visual analytics systems that integrate well with their work processes and truly provide benefit to them. In order to close this knowledge gap, we conducted a longitudinal, observational field study of intelligence analysts in training within the intelligence program at Mercyhurst College. We observed three teams of analysts, each working on an intelligence problem for a 10-week period. Based on the findings of the study, we describe and characterize processes and methods of intelligence analysis that we observed, make clarifications regarding the processes and practices, and suggest design implications for visual analytics systems for intelligence analysis.
Article
Organizations rely on data analysts to model customer engagement, streamline operations, improve production, inform business decisions, and combat fraud. Though numerous analysis and visualization tools have been built to improve the scale and efficiency at which analysts can work, there has been little research on how analysis takes place within the social and organizational context of companies. To better understand the enterprise analysts' ecosystem, we conducted semi-structured interviews with 35 data analysts from 25 organizations across a variety of sectors, including healthcare, retail, marketing and finance. Based on our interview data, we characterize the process of industrial data analysis and document how organizational features of an enterprise impact it. We describe recurring pain points, outstanding challenges, and barriers to adoption for visual analytic tools. Finally, we discuss design implications and opportunities for visual analysis research.
Article
We present a methodological approach, called Group Informatics, for understanding the social connections that are created between members of technologically mediated groups. Our methodological approach supports focused thinking about how online groups differ from each other, and diverge from their face-to-face counterparts. Group Informatics is grounded in 5 years of empirical studies of technologically mediated groups in online learning, software engineering, online political discourse, crisis informatics, and other domains. We describe the Group Informatics model and the related, 2-phase methodological approach in detail. Phase one of the methodological approach centers on a set of guiding research questions aimed at directing the application of Group Informatics to new corpora of integrated electronic trace data and qualitative research data. Phase 2 of the methodological approach is a systematic set of steps for transforming electronic trace data into weighted social networks.
Article
A perennial criticism regarding the use of social media in social science research is the lack of demographic information associated with naturally occurring mediated data such as that produced by Twitter. However the fact that demographics information is not explicit does not mean that it is not implicitly present. Utilising the Cardiff Online Social Media ObServatory (COSMOS) this paper suggests various techniques for establishing or estimating demographic data from a sample of more than 113 million Twitter users collected during July 2012. We discuss in detail the methods that can be used for identifying gender and language and illustrate that the proportion of males and females using Twitter in the UK reflects the gender balance observed in the 2011 Census. We also expand on the three types of geographical information that can be derived from Tweets either directly or by proxy and how spatial information can be used to link social media with official curated data. Whilst we make no grand claims about the representative nature of Twitter users in relation to the wider UK population, the derivation of demographic data demonstrates the potential of new social media (NSM) for the social sciences. We consider this paper a clarion call and hope that other researchers test the methods we suggest and develop them further.