ArticlePDF Available

Abstract and Figures

Over the last few years, microblogging has gained prominence as a form of personal broadcasting media where information and opinion are mixed together without an established order, usually tightly linked with current reality. Location awareness and promptness provide researchers using the Internet with the opportunity to create "psychological landscapes"--that is, to detect differences and changes in voiced (twittered) emotions, cognitions, and behaviors. In our article, we present iScience Maps, a free Web service for researchers, available from http://maps.iscience.deusto.es/ and http://tweetminer.eu/ . Technologically, the service is based on Twitter's streaming and search application programming interfaces (APIs), accessed through several PHP libraries, and a JavaScript frontend. This service allows researchers to assess via Twitter the effect of specific events in different places as they are happening and to make comparisons between cities, regions, or countries regarding psychological states and their evolution in the course of an event. In a step-by-step example, it is shown how to replicate a study on affective and personality characteristics inferred from first names (Mehrabian & Piercy, Personality and Social Psychology Bulletin, 19, 755-758 1993) by mining Twitter data with iScience Maps.Results from the original study are replicated in both world regions we tested (the western U.S. and the U.K./Ireland); we also discover base rate of names to be a confound that needs to be controlled for in future research.
Content may be subject to copyright.
Mining twitter: A source for psychological wisdom
of the crowds
Ulf-Dietrich Reips &Pablo Garaizar
#Psychonomic Society, Inc. 2011
Abstract Over the last few years, microblogging has gained
prominence as a form of personal broadcasting media where
information and opinion are mixed together without an
established order, usually tightly linked with current reality.
Location awareness and promptness provide researchers using
the Internet with the opportunity to create psychological
landscapes”—that is, to detect differences and changes in
voiced (twittered) emotions, cognitions, and behaviors. In our
article, we present iScience Maps, afreeWebservicefor
researchers, available from http://maps.iscience.deusto.es/ and
http://tweet miner.eu/. Technologically, the service is based on
Tw i t t e r s streaming and search application programming
interfaces (APIs), accessed through several PHP libraries, and
a JavaScript frontend. This service allows researchers to assess
via Twitter the effect of specific events in different places as
they are happening and to make comparisons between
cities, regions, or countries regarding psychological states and
their evolution in the course of an event. In a step-by-step
example, it is shown how to replicate a study on affective
and personality characteristics inferred from first names
(Mehrabian & Piercy, Personality and Social Psychology
Bulletin, 19, 755758 1993) by mining Twitter data with
iScience Maps.Results from the original study are replicated
in both world regions we tested (the western U.S. and the
U.K./Ireland); we also discover base rate of names to be a
confound that needs to be controlled for in future research.
Keywords Twitter .Geolocation .iScience maps .
Microblogging .Internet science .Text mining .Tweet
Introduction
iScience Maps for Twitter is a set of Web applications
designed to help researchers interested in social media
analysisspecifically, mining the billions of tweets
(brief written messages) on Twitter that are written every
month, for scientific research. The Web service is
available from http://maps.iscience.deusto.es/ and http://
tweetminer.eu/.
Social media(a category of Web services that have
recently attracted millions of Internet users, have
become interesting resources for social-behavioral re-
search. From the traces of information created by the
behavior of the masses, the wisdom of the crowds
emerges. For example, David Crandall and colleagues
from Cornell University created maps of world regions
from ca. 35 million geotagged photos that had been
uploaded to flickr, a social media platform for the
exchange of pictures and attached tags (Barras, 2009).
These maps show relative interest in motifs and places and
may lead to applications in tourism, city planning,
ecology, and economics (Reips, in press). City planners
may trace such behaviorally driven location maps over
long periods and, thus, identify areas to be made
accessible via public transportation. In a similar vein,
the wisdom of the crowds of researchershas been
used to identify hot topicsin psychological research.
Reips (2007,in press) reported such analyses from titles
and topics of studies on the Web experiment list and the
Web survey list, two free Web services for researchers
that help in the recruitment of participants and in the
archiving of studies (see http://wexlist.net;Reips&
Lengler, 2005).
Twitter is a Web application where users can post text-based
messages of up to 140 characters, called tweets. Apart from
this microblogging service, Twitter also works as a social
U.-D. Reips (*):P. Garaizar
University of Deusto,
Bilbao, Spain
e-mail: u.reips@ikerbasque.org
P. Garaizar
e-mail: garaizar@deusto.es
U.-D. Reips
IKERBASQUE, Basque Foundation for Science,
Bilbao, Spain
Behav Res
DOI 10.3758/s13428-011-0116-6
network, allowing its users to follow other users, group
them in lists, forward other usersmessages (retweet,in
Twitter terminology), or send private messages. Figure 1
shows the Twitter Web site.
Twitter has more than 145 million registered users (Van
Grove, 2010) and produces a large amount of information
each day, 155 million tweets per day (Garrett, 2011). Thus,
it is nearly impossible to capture all this information, due to
limitations of bandwidth, storage, and rate. The percentage
of tweets that contain information about the senders
location is increasing but still very small (0.23% in January
2010, 0.6% in June 2010). However, rough location can
often be inferred from a users profile. An analysis of this
information by Semiocast (2010) shows large differences of
Twitter use between world regions and countries. More than
25% of tweets are generated in the U.S., followed by Japan
(18%), Indonesia (12%), Brazil (11%), and the U.K. (6%).
About 37% of tweeting happens in Asia, 31% in North
America, 15% in South America, 14% in Europe, and about
1.5% each in Africa and Oceania.
Simple Twitter search is available in many browsers and
online applications. Such search services are available from
Twitter directly or via the Twitter application programming
interfaces (APIs).
1
For instance, Fig. 2shows a Twitter
service that is integrated with a Web browser interface. It
automatically searches Twitter space for tweets related to
the content of the Website currently on display (in this
example, Google search results for SCiP) and general
trends in Twitter space. Simple Twitter search in the form of
monitoring certain terms has been used in research on
elections (Mislove, Lehmann, Ahn, Lazer, Lin, Onnela, &
Rosenquist, 2010; see http://election.ccs.neu.edu/).
Using iScience maps
We developed iScience Maps mainly to implement com-
parative searches of Twitter space. In iScience Maps, it is
possible to combine terms, using Boolean operators, and to
compare searches for different locations. The results from
some types of searches are visualized on mapshence, the
name iScience Maps.The advances implemented in
iScience Maps further include the option to download
results in several formats. Generally, the tool is targeted at
behavioral researchers, while almost all other available
tools are designed for personal Twitter users, marketing
purposes, or the simple search described above.
Upon arriving at the site, the Hometab is displayed
(Fig. 3). Here, the visitor finds a description of the site and
brief instructions on how to use it. Using tabs, the visitor can
move to the two main types of searches, global search and
local search,andtoanAbouttab. In the present section, we
explain how to use the global and local search features
available in iScience Maps, using step-by-step examples.
Global search
Figure 4shows the screen visible to the sites visitor in
Global Search”—in this case, after a search for the term
1
From a technological point of view, there are three different Twitter
APIs: REST API, Search API, and Streaming API. The first two
REST and Search APIare separated for historical reasons (Twitter
acquired Summize Inc. and rebranded it as Twitter Search) but work in
a similar way. The third oneStreaming APIwas designed to
provide large amounts of data to third-party applications focused on
Twitter content analysis and works in a very different way. The Twitter
REST API methods allow developers to interact with the full range of
features regarding a specific Twitter account, using Representational
State Transfers (REST).
Fig. 1 The Twitter Website
Behav Res
wikileaksfor a date range from October 30 to December
30 of 2010. Any dot in the resulting world map can be
clicked to summon a pop-up window with further informa-
tion about tweets and location. There is also an option for
selecting the number of intervals within the selected date
range that is used by iScience Maps to play a movie to the
user that shows the development over time. In the given
example, which the reader may execute and verify at the
Fig. 2 Twitter service that is
integrated with a Web browser
interface. It automatically
searches Twitter space for tweets
related to the content of the
Website currently on display (in
this example, Google search
results for SCiP) and general
trends in Twitter space
Fig. 3 iScience Maps site at
http://maps.iscience.deusto.es/
and http://tweetminer.eu/
Behav Res
iScience Maps Website, the frequency of tweets changed
dramatically when WikiLeaks suddenly made the headlines
all over the world. Figure 4shows the seventh interval out
of ten, as indicated by the large dot located above the movie
controls in the lower part of the screen.
Via Advanced Searchand using the +button,
searches can be expanded to any number of terms that are
combined by Boolean search operators. For instance, in
the example above, one could add terms like traitoror
heroto wikileaksand combine them with Boolean
operators like andor and notto find out about
relative proportions of positive and negative responses to
WikiLeaks.
The layout in Global Search is divided into three panels:
one query panel at the top, one dynamic map at the middle,
and one results panel at the bottom. The results panel
displays the date range, searches, number of intervals
chosen for the movie, and the number of tweets found.
Results can be copied to the clipboard or downloaded in
Comma Separated Values (CSV) or Excel (XLS) format for
statistical analysis.
Local search
iScience Maps’“Local Searchprovides two query panels
grouped side by side to easily compare searches between
places, date ranges, and search terms. Both panels work
the same way and can be used simultaneously. Search
terms from one panel can be copied to the other by one
click. Three parameters have to be defined to perform a
query:
&Where? Location can be defined using the map to set
the area range. The area boundaries will be defined by
the zoom used in the map widget. The text field above
the map can also be used to search for a location that is
then displayed in the map widget.
&When? A horizontal scale with two sliders allows users
to define a date range. Twitter provides data only for the
last 37 days.
&What? Pressing the +button initiates a Boolean
search for two search terms. Three operators (and,or,
and not) can be defined. Each text field should be used
Fig. 4 Example search for
wikileaksin iScience Maps
Global Search tab. The panel at
the top shows configuration
options for date range, terms
(optional: more terms and
Boolean operators), number of
intervals in resulting animation,
and number of tweets found.
Any dot in the resulting world
map can be clicked to summon a
pop-up window (right side) with
further information about tweets
and location. Buttons at the
bottom control the animation
Behav Res
to define exactly one term. The number of terms is not
limited, and the search is case insensitive.
There are two APIs to be queried: The Twitter API that is
used in Local Search provides more results than its
alternative, but has a limit of 1,500 results. This API is
slow if there are too many results (15 seconds per query),
the date range is limited, and there is a 2,500 km maximum
range. The iScience API that is used in Global Search has
no result limits, no date range limits, and no distance range
limits but shows geotagged tweets only (i.e., there is no
profile-based location inference). It draws on a random
sample of 1%10% of all tweets.
Results panel
All queriesresults will be stored in a dynamic table, which
can be reordered by clicking the column headers. There are
three buttons at the top of the panel to export the results to
the clipboard or download them in CSVor Excel format.
A step-by-step example using local search
In this section, we describe how to run a partial replication of
a study on affective and personality characteristics inferred
from first names, published in 1993 by Mehrabian and
Piercy. From Table 2 in their article, we take the first six male
names; for three of these, (Alexander, Charles, Kenneth) the
connotation of the dimension successfulwas strong, and
for three (Otis, Tyrone, Wilbur), it was weak. Successful
meant ambitious,”“intelligent,and creative.If these
nameshaving the connotation of a personality characteristic
really holds, this likely should be apparent when Twitter is
mined, because attributions to persons, such as Charles is an
intelligent guy,frequently appear in text-based message
services like Twitter.
Method
To avoid reaching the maximum threshold for number of
tweets per search imposed by Twitter (1,500), we search for
one name only at a time, and only for a 3-day period. For
example, having selected the Local searchtab, we take
the following steps:
1. Define locations in the two map areas: the western
U.S. in the left map, the U.K. and Ireland in the right
map. Any circular geographical area can be defined
precisely by clicking on Text-based area definition
just below a map, then entering the geographical
coordinates of a point and a radius. For example, the
western U.S. can be defined aproximately using the
coordinates 39.53393 (latitude) and 118.75542 (longi-
tude) and a radius of 1,120 km.
2. Define the date range using the slider When?Use the
last 3 days.
3. To find and later adjust for the base rate, we first do a
simple search for each name. Type Otisin the
What?text field and press Get Results.Repeat this
step for all of the names.
4. Then we search for each name in combination with an
attribute, Fig. 5shows a search for Charlesin
combination with intelligent,comparing the U.S.
west coast with the U.K. and Ireland.
5. Scrolling down reveals the results for direct viewing
and download. Clicking on the first item in each row
(Twitter) connects to Twitter via the Web and shows
the actual tweets.
Results
Supporting the original findings for male names in the U.S.,
we did not find a single combination of the low-connotation
names with any of the terms successful,”“ambitious,
intelligent,and creative.All the high-connotation
names did indeed appear in the same tweets with some of
the aforementioned terms; for example, Alexander appeared
6 times with either creativeor successful(out of a base
rate of 5,478 appearances overall). Kenneth was tweeted 15
times in combination with successful(base rate: 2,005),
and Charles 38 times with creative,”“intelligent,or
successful(base rate>16,760
2
).
These findings replicate for tweets from the U.K.: no
tweets for combinations of the four personality characteristics
with the low-connotation names, but some combinations for
two of the three high-connotation names. Charles appeared 15
times with either creativeor intelligent(base rate: 1,621),
and Kenneth 5 times in combination with either successful
or intelligent(base rate: 323). Alexander appeared 1,215
times without any of the terms.
Criticallyand this can be derived directly from our
Twitter studythe base rate of high-connotation versus
low-connotation names (Otis, 1,296; Tyrone, 1,324; Wilbur,
355) appears to be a confounding factor and may also
explain findings in the original study, because less frequent
names may cognitively be less associated with any
personality characteristics. Thus, to control for base rate
effects, the study would need to be complemented by
searches for combinations of names with opposite con-
notationsfor example, name plus unsuccessful.We
2
The Twitter API limits the number of results to a maximum of 1,500
per search, so we can provide only a minimum value for this
combined result.
Behav Res
encourage the reader to use iScience Maps in doing so to
further explore and expand on the example.
Features
iScience Maps is targeted to researchers interested in
mining Twitter. It provides temporal and geospatial content
analysis and a rich set of features for comparative search
options. Trends within a date interval can be detected via
the Global Search panel and can be visualized as an
animated movie using the Scalable Vector Graphics (SVG)
based worldwide maps animations. Local Search provides
two query forms and maps to do comparative searches. In
both clients (global and local), if the Boolean content
search field is empty, all Twitter statuses matching the
location and date range will be retrieved. Hence, it is
possible to calculate relative proportions of search term
combinations in the Twitter space for a given geo
location.
Depending on the research question, a researcher may wish
to combine the location information with aggregated data
available via zip codefor example, from the U.S. Census
We b s i t e a t http://www.census.gov/epcd/www/zipstats.html.
These data can help determine the extent to which tweeting
on a particular topic is concentrated in, for example, affluent
communities across the U.S.
Each time a researcher receives a result using a query
form, a new row is added to the table of results. This
table is a dynamic widget; thus, all its content can be
easily rearranged by just clicking on the header cells. It
also lets one export its content to the clipboard or to
CSV or Excel (XLS) format. The tweets can be accessed
as well.
Fig. 5 Comparison interface, directly available from http://maps.iscience.deusto.es/local/. The example shows a comparison for the Boolean
search CharlesAND intelligentin the western U.S. and the U.K./Ireland
Behav Res
The extraordinary success of Twitter has much to do with
its APIs. APIs enable third-party services to successfully use a
platform without dealing with its implementation details,
showing it as a black boxfull of features. All of the Web
applications that are part of iScience Maps use Twitter APIs
intensively but also provide their own public API with refined
results. In this way, researchers can combine raw results from
official Twitter APIs with refined results taken from the
iScience Maps API and can cross-check trends or proportional
ratios. Another interesting feature derived from iScience
Mapsmodular architecture is the possibility of linking
third-party clients that use iScience MapsAPI in richer ways.
In the next section, we will compare the iScience Maps
platform and its main features with other existing services
for searching Twitter. Depending on the requirements of a
research project, some of these services may complement
iScience Maps.
Comparison of iScience maps with other Twitter search
services
There are a number of Web services that were developed
around the Twitter APIs. Not all of these services were
developed specifically for scientific research, but they may be
useful in performing certain tasks that may be needed in
research projects. For example, during the initial stage of
developing a research question, one may want to use Monitter
(http://www.monitter.com), which lets one monitor the Twitter
world in real time for a set of keywords and watch what
Twitter users are writing. In Table 1, we provide an overview
of third-party services for searches on Twitter, and we
compare their features with those provided by iScience Maps.
A second category of Web service providers do not
provide search options but provide APIs that focus on
Twitter content. Some of them offer interesting packages
like GNIPsPremium Twitter feeds(http://gnip.com/
twitter): (1) Twitter Halfhose (~50% of all Twitter content,
delivered in realtime), (2) Twitter Decahose (~10% of all
Twitter content, similar to TwittersGardenhose), (3)
Twitter Link Stream (all Twitter statuses containing URLs,
delivered in realtime), and (4) Twitter User Mention Stream
(all Twitter statuses that mention any user). The main
drawback of GNIPs services is their steep price, which is
generally not suitable for low-budget research initiatives.
Semiocast (http://semiocast.com/) is another company
specializing in Twitter content analysis that provides
semantic analysis services through a public API. This API
can be used to analyze, filter, and prepare Twitter statuses in
terms of their language or location. Semiocast API allows
up to 1,024 API calls per day for free. 140kit (http://140kit.
com; see Gaffney, Pearce, Darham, & Nanis, 2010) is a free
Web service that enables complete data pulls for a set of
Table 1 Comparison of iScience Maps with other Twitter search applications
Service name URL Focus Near-Real-Time
Content
Arbitrary
Searches
Boolean
Search
Quantitative
Analysis
Date
Range
Geo-location Public API
Monitter http://www.monitter.com Real-time monitoring Yes Yes No No No Yes No
TwitterLocal http://www.twitterlocal.net Local business No Yes No No No Yes No
LocalTweeps http://www.localtweeps.com Twitter users No No No No No USA, Canada, UK only No
Twitspy http://twitspy.com Google Maps+ Twitter mashup Yes No No No No Yes No
MyTweetMap http://www.mytweetmap.com Twitter client +geolocation Yes No No No No Yes No
TweetMeme http://tweetmeme.com Digg +Twitter mashup Yes Yes No Yes 7 days No No
TweetStats http://tweetstats.com Per-user statistics No No No Yes No No No
Twitris http://twitris.knoesis.org Semantic Twitter analysis No No No Yes, and semantic Yes Yes No
Trendistic http://trendistic.com Trends analysis Yes Yes No Yes 180 days No No
iScience Maps http://maps.iscience.deusto.es Scientific research Yes Yes Yes Yes Yes Yes Yes
Behav Res
users or terms on Twitter, with searches running continu-
ously through Twitter streaming API running in their
servers. Those data pulls can be downloaded and processed
locally, combined with other usersdata pulls, and analyzed
online, generating basic visualizations. Most of these
features can also be used through their public API.
We developed iScience Maps to work independently,
without using third-party services, but in the future we may
consider contracting some processes from third parties, if
the third-party services become more powerful and less
expensive. Like iScience Maps, GNIP provides filtered
Twitter s streaming API content. Although iScience Maps
compares favorably with GNIP on the cost dimension, the
gathering process of iScience Maps could be outsourced to
GNIP if they offered a Twitter Geotagged Stream,
providing a filtered version of Twitter Firehose with all
geotagged statuses. In a similar way, Semiocast services
could be added to iScience Maps to filter content more
deeply with its location and semantic filters. Since the
Semiocast API is a paid service (for more than 1,024
requests per day), iScience Mapscurrent version does not
use this service for filtering. 140kit is a useful service for
performing searches on Twitter, but only for the brief time
window of one week. 140kit only provides on-demand
Twitter data pulls; thus, a researcher would have to act
quickly and ask for a 1-week data pull on this platform.
iScience MapsGlobal Search works in a more sustainable
way, since all gathered Twitter content can be queried at any
moment. Another important difference between 140kit and
iScience Maps is that the latter provides location-based
filtering, in addition to content-based filtering.
Discussion and outlook
By mining Twitter content using iScience Maps, we
replicated the findings of research inferring affective and
personality characteristics from first names (Mehrabian &
Piercy, 1993). Findings were replicated in two different
English-speaking areas of the world, the western U.S. and
the U.K./Ireland. Furthermore, we measured the base rates
of first names appearing in the same samples of tweets (in
only a matter of a few minutes, using iScience Maps). This
revealed that a crucial factor, the base rates of first names,
appears to have confounded the results in the original study.
Base rate neglect is a common cognitive phenomenon
(Kahneman & Tversky, 1972; Reips & Waldmann, 2008),
and the present results indicate that researchers are not
exempt from its effects. Thus, we tacitly conclude that our
tool has merits for conducting psychological research.
The iScience Maps Twitter tool will continue to be
developed. Currently, we are seeking contact with the
developers at Twitter. We are proposing to them a researcher
APIthat would make Twitter s information about tweets
much more accessible to researchers. In comparison with
Twitter s current APIs, the one we propose has many
benefits, including (1) very little programming work for
Twitter, (2) a reduced number of accesses to the current APIs,
and (3) helping immensely the community of researchers
who would like to use the Twitter stream in their work.
Author Note This research was first presented at the 40th Annual
Meeting of the Society for Research in Psychology (SCiP), St. Louis,
November 18, 2010. It was partially supported by grant IT363-10
from Departamento de Educación, Universidades e Investigación of
the Basque Government. We thank Unai Goikoetxeta for technical
help in setting up the servers for our local Twitter API, Ted Cascio for
copyediting, and Marc Brysbaert, Laura Buffardi, and an anonymous
reviewer for valuable feedback.
References
Barras, G. (2009). Gallery: Flickr users make accidental maps. New
Scientist. Retrieved April 27, 2009 from http://www.newscientist.
com/article/dn17017-gallery-flickr-user-traces-make-accidental-
maps.html
Gaffney, D., Pearce, I., Darham, M., & Nanis, M. (2010). Presenting
140Kit: An open, extensible research platform for Twitter. Retrieved
from http://www.webecologyproject.org/2010/07/presenting-
140kit/
Garrett, S. (2011). Twitter / Sean Garrett: Oh and - not a Q1 stat - but
noticed that we're now at 155 million Tweets per day, up from 55
million at this time last year. Retrieved April 6, 2011 from http://
twitter.com/#!/twitterglobalpr/status/55779434350907392
Kahneman, D., & Tversky, A. (1972). Subjective probability: A judgment
of representativeness. Cognitive Psychology, 3, 430454
Mehrabian, A., & Piercy, M. (1993). Affective and personality
characteristics inferred from length of first names. Personality
and Social Psychology Bulletin, 19, 755758. doi:10.1177/
0146167293196011
Mislove, A., Lehmann, S., Ahn, Y.-Y., Lazer, D., Lin, Y., Onnela, J.-P.,
& Rosenquist, J. N. (2010). Mapping the conversation: Political
topics and geography on Twitter. Retrieved from http://election.
ccs.neu.edu/
Reips, U.-D. (2007). The methodology of Internet-based experiments.
In A. Joinson, K. McKenna, T. Postmes, & U.-D. Reips (Eds.),
The Oxford handbook of Internet psychology (pp. 373390).
Oxford: Oxford University Press.
Reips, U.-D. (in press). Using the Internet to collect data. In H.
Cooper, P. Camic, R. Gonzalez, D. Long, & A. Panter (Eds.), APA
handbook of research methods in psychology. Washington, DC:
American Psychological Association.
Reips, U.-D., & Lengler, R. (2005). The Web experiment list: A Web
service for the recruitment of participants and archiving of
Internet-based experiments. Behavior Research Methods, 37,
287292.
Reips, U.-D., & Waldmann, M. (2008). When learning order affects
sensitivity to base rates: Challenges for theories of causal
learning. Experimental Psychology, 55, 922.
Semiocast (2010, June). Retrieved April 6, 2001 from http://semio
cast.com/pr/20100701/Asia_first_Twitter_region
Van Grove, J. (2010, September 3). Twitter surpasses 145 million
registered users. Mashable. Retrieved April 6, 2001 from http://
mashable.com/2010/09/03/twitter-registered-users-2/
Behav Res
... Internet-based studies are roughly systematized in four categories (Reips 2006, see Fig. 2): Internetbased experiments (Reips 2002), web surveys and questionnaires (Dillman and Bowker 2001;Dillman et al. 2009), Internet-based assessments (Buchanan 2001;Buffardi and Campbell 2008), and nonreactive data collection on the Internet (Reips and Garaizar 2011). Within psychology, most Internet-based research is conducted in the fields of social psychology and cognition (Musch and Reips 2000;Reips and Lengler 2005). ...
... Recently, the advent of social media, that is, highly interactive Internet-based communication platforms, has spurred the interest of researchers. Social networking services like Facebook, Tuenti, Orkut, LinkedIn, Twitter, and Student VZ are seen as vast resources for detailed descriptions of human behavior (Reips and Garaizar 2011), and thus open source social network systems for use by researchers have been developed (Garaizar and Reips 2019). Providers of large commercial platforms even offer interfaces (so-called APIs or interactive websites) for researchers to study big datasets that come from social media sites or other services like Google search or Google Ngrama service that provides word frequencies from a massive corpus of 6% of all books that were published since 1800 (Michel et al. 2011) and has recently caught much attention as a tool in research (Younes and Reips 2019). ...
... Internet-based studies are roughly systematized in four categories (Reips, 2006, see Fig. 2): Internet-based experiments (Reips, 2002), web surveys and questionnaires (Dillman & Bowker, 2001;Dillman, Smyth, & Christian, 2009), Internet-based assessments (Buchanan, 2001;Buffardi & Campbell, 2008), and nonreactive data collection on the Internet (Reips & Garaizar, 2011). Within psychology, most ...
... Recently, the advent of social media, that is, highly interactive Internet-based communication platforms, has spurred the interest of researchers. Social networking services like Facebook, Tuenti, Orkut, LinkedIn, Twitter, and Student VZ are seen as vast resources for detailed descriptions of human behavior (Reips & Garaizar, 2011). ...
... Reips & Birnbaum, 2011;Reips & Buffardi, 2012). 474 Tools available for such research include: FactorWiz and SurveyWiz (Birnbaum, 2000); iScience Maps (Reips & Garaizar, 2011); innovative social location-aware services for mobile phones like MUGGES (Klein & Reips, this volume), Scientific LogAnalyzer (Reips & Stieger, 2004); the Web experiment list (Reips & Lengler, 2005); the Web Experimental Psychology Lab (Reips, 2001); WEXTOR, a Web experiment generator (Reips & Neuhaus, 2002); ReCal OIR (Freelon, 2013); VAS Generator (Reips & Funke, 2008); Dynamic Interviewing Program and User Action Tracer (Stieger & Reips, 2008, 2010; among many others. The number of studies conducted via the Internet with such tools has grown almost exponentially since 1995 (Reips & Krantz, 2010). ...
... Large collections of entries or traces from human behavior on the Internet have become an accessible source for research. Examples include the definition of points of interest via data mining in uploaded pictures (Barras, 2009) prediction of influenza outbreaks from searches (Ginsberg et al., 2009), and our own work on attributions of personality characteristics to first names accessed via Twitter mining (Reips & Garaizar, 2011). Upon the big success of its search engine that became available on the web in 1997, Google has created freely available interfaces to their search data. ...
Article
Full-text available
The present article reviews web-based research in psychology. It captures principles, learnings, and trends in several types of web-based research that show similar developments related to web technology and its major shifts (e.g., appearance of search engines, browser wars, deep web, commercialization, web services, HTML5…) as well as distinct challenges. The types of web-based research discussed are web surveys and questionnaire research, web-based tests, web experiments, Mobile Experience Sampling, and non-reactive web research, including big data. A number of web-based methods are presented and discussed that turned out to become important in research methodology. These are one-item-one-screen design, seriousness check, instruction manipulation and other attention checks, multiple site entry technique, subsampling technique, warm-up technique, and web-based measurement. Pitfalls and best practices are described then, especially regarding dropout and other non-response, recruitment of participants, and interaction between technology and psychological factors. The review concludes with a discussion of important concepts that have developed over 25 years and an outlook on future developments in web-based research.
... WEXTOR at http://wextor.org ; see Reips & Neuhaus, 2002), and portals that link to related services (e.g. the iScience Server at http://iscience.eu). Web services for researchers can be used in teaching as well, e.g. to introduce the concept of and create visual analogue scales in measurement (Reips & Funke, 2008) or for exercising data mining in tweets, the short messages exchanged in the social network Twitter (Reips & Garaizar, 2011). Statistical web services became available for intercoder reliability calculation (Freelon, 2010) or for the calculation of effect size in a Student t test (Soper, 2012). ...
... These characteristics have aroused the interest of many researchers who seek to understand, for example, social relationships and behavior [36]- [40]; large-scale contagion processes [41], [42]; tracking preferences and / or large audiences [43]; social behaviors and attitudes [44]; collective experiences based on a timely event [45], [46]; the collection of large amounts of data on hard-to-reach populations [47]; mapping mood swings and other feelings [39], [48]; and research that points to the potential of social networks for the production of intelligence in the city is linked to the organization of social movements and the internet [49]. ...
Article
In the era of data deluge, the world is experiencing an intensive growth of Big data with complex structures. While processing of these data is a complex and labor-intensive process, a proper analysis of Big data leads to greater knowledge extraction. In this paper, Big data is used to predict high-risk factors of Diabetes Mellitus using a new integrated framework with four Hadoop clusters, which are developed to classify the data based on Multi-level MapReduce Fuzzy Classifier (MMR-FC) and MapReduce-Modified Density-Based Spatial Clustering of Applications with Noise (MR-MDBSCAN) algorithm. Big data concerning people’s food habits, physical activity are extracted from social media using the API’s provided. The MMR-FC takes place at three levels of index (Glycemic Index, Physical activity Index, Sleeping Pattern) values. The fuzzy rules are generated by the MMR-FC algorithm to predict the risk of Diabetes Mellitus using the data extracted. The result from MMR-FC is used as an input to the semantic location prediction framework to predict the high-risk zones of Diabetes Mellitus using the MR-MDBSCAN algorithm. The analysis shows that more than 55% of people are in a high-risk group with positive sentiments on the data extracted. More than 70% of food with a high Glycemic Index is usually consumed during Night and Early Evenings, which reveals that people consume food that has a high Glycemic Index during their sedentary slot and have irregular sleep practices. Around 70% of the unhealthiest dietary patterns are retrieved from urban hotspots such as Delhi, Cochin, Kolkata, and Chennai. From the results, it is evident that 55 % of younger generations, users of social networking sites having high possibilities of Type II Diabetes Mellitus at large.
Article
Full-text available
The identification of human behavior can provide useful information across multiple job spectra. Recent advances in applying data-based approaches to social sciences have increased the feasibility of modeling human behavior. In particular, studying human behavior by analyzing unstructured textual data has recently received considerable attention because of the abundance of textual data. The main objective of the present study was to discuss the primary methods for identifying and predicting human behavior through the mining of unstructured textual data. Of the 823 articles analyzed, 87 met the predefined inclusion criteria and were included in the literature review. Our results show that the included articles could be symmetrically classified into two groups. The first group of articles attempted to identify the leading indicators of human behavior in unstructured textual data. In this group, the data-based approaches had three main components: (1) collecting self-reported survey data, (2) collecting data from social media and extracting data features, and (3) applying correlation analysis to evaluate the relationship between two sets of data. In contrast, the second group focused on the accuracy of data-based approaches for predicting human behavior. In this group, the data-based approaches could be categorized into (1) approaches based on labeled unstructured textual data and (2) approaches based on unlabeled unstructured textual data. The review provides a comprehensive insight into unstructured textual data mining to identify and predict human behavior and personality traits.
Article
Full-text available
This article discusses methods and techniques, procedures and tools that have been found to be necessary or useful in Internet-based experimenting. While the focus is on experiments, many of the methods apply to other types of Internet-based research as well. The article is structured in a step-by-step fashion, guiding the reader through the various stages of setting up and conducting a web experiment. Apart from general issues, the relevant steps begin with planning, generating, and pre-testing an experiment. They continue with recruitment and monitoring, then analysis and archiving.
Article
Full-text available
The Web Experiment List (http://genpsylab-wexlist.unizh.ch/), a free Web-based service for the recruitment of participants in Internet-based experiments, is presented. The Web Experiment List also serves as a searchable archive for the research community. It lists more than 250 links to and descriptions of current and past Web experiments. Searches can be conducted by area of research, language, type of study, date, and status (active vs. archived). Data from log file analyses reveal an increasing use of the Web Experiment List and provide a picture of the distribution of the use of the Web experiment method across disciplines. On a general theoretical note, Web services are discussed as a viable software alternative to the traditional program format.
Article
Full-text available
In three experiments we investigated whether two procedures of acquiring knowledge about the same causal structure, predictive learning (from causes to effects) versus diagnostic learning (from effects to causes), would lead to different base-rate use in diagnostic judgments. Results showed that learners are capable of incorporating base-rate information in their judgments regardless of the direction in which the causal structure is learned. However, this only holds true for relatively simple scenarios. When complexity was increased, base rates were only used after diagnostic learning, but were largely neglected after predictive learning. It could be shown that this asymmetry is not due to a failure of encoding base rates in predictive learning because participants in all conditions were fairly good at reporting them. The findings present challenges for all theories of causal learning.
Article
The Name Connotation Profile was used to investigate judgments made about another based on that person's given name only. Nicknames were excluded to avoid confounding of name length with given name versus nickname effects. Longer names, because of their greater substance or "mass, n were expected to convey characteristics associated with a high social position (successful moral). Shorter names, with their ease of use and greater informality, were expected to convey approachable qualities (popular; cheerful, warm). Except for the length/warmth hypothesis, all hypotheses were supported for men. Also as hypothesized, subjects inferred greater masculinity (less femininity) for men with shorter names-a result corroborated by an additional finding showing that male names were shorter than female names. Only one of the preceding hypotheses was sup ported for women's names: shorter names connoted greater warmth.
Article
This paper explores a heuristic-representativeness-according to which the subjective probability of an event, or a sample, is determined by the degree to which it: (i) is similar in essential characteristics to its parent population; and (ii) reflects the salient features of the process by which it is generated. This heuristic is explicated in a series of empirical examples demonstrating predictable and systematic errors in the evaluation of un- certain events. In particular, since sample size does not represent any property of the population, it is expected to have little or no effect on judgment of likelihood. This prediction is confirmed in studies showing that subjective sampling distributions and posterior probability judgments are determined by the most salient characteristic of the sample (e.g., proportion, mean) without regard to the size of the sample. The present heuristic approach is contrasted with the normative (Bayesian) approach to the analysis of the judgment of uncertainty.
Retrieved Twitter surpasses 145 million registered users
  • Semiocast
Semiocast (2010, June). Retrieved April 6, 2001 from http://semio cast.com/pr/20100701/Asia_first_Twitter_region Van Grove, J. (2010, September 3). Twitter surpasses 145 million registered users. Mashable. Retrieved April 6, 2001 from http:// mashable.com/2010/09/03/twitter-registered-users-2/
Presenting 140Kit: An open, extensible research platform for Twitter
  • D Gaffney
  • I Pearce
  • M Darham
  • M Nanis
Gallery: Flickr users make accidental maps
  • G Barras