Virtual labs 1
Running head: VIRTUAL LABS
Turning virtual public spaces into laboratories: Thoughts on conducting online field studies
using social network sites
Ilka H. Gleibs
London School of Economics and Political Science, U.K.
This research was supported by a grant from the Engineering and Physical Science Research
Council (EP/J005053/1). Thanks to Saadi Lahlou, Janelle Jones, Nils Metternich, Jessica
Salvatore, Neil Wilson, the students from PS950 at the LSE Department of Social
Virtual labs 2
This article deals with the topic of ethics in large-scale online studies on social
network sides. ‘Big data’ and large-scale online field studies are a relatively new
phenomenon and clear ethical guidelines for social science research are still lacking. In this
paper I focus on the ethical question of getting informed consent when we collect data from
Social Network Sites (SNS). I argue that data from SNS are not per se public and research
based on these data should not be exempt from the ethical standard that informed consent
must be obtained from participants. Based on the concept of privacy in context (Nissenbaum,
2010), I further propose that this is because the norms of distribution and appropriateness are
violated when researchers manipulate online contexts and collect data without consent.
Finally, I make suggestions for existing and possible future practices for large-scale online
Virtual labs 3
Turning virtual public spaces into laboratories: Thoughts on conducting online field studies
using social network sites
Social scientists – including social psychologists - are increasingly interested in
conducting research using the content created on social network sites. Some propose that the
internet is one huge field setting that generates new and previously unavailable opportunities
for studying social interaction (Kramer, 2013), and speak about the fact that ‘big data will
revolutionise (social) science’ (Mayer-Schonberger & Cukier, 2013; Watts, 2007). The
internet, and especially social networking sites (SNS) such as Facebook, Twitter and the
Chinese Renren, is seen as a research field that offers the chance to observe ‘real behaviour’1,
making it possible to (unobtrusively) examine human experience (Buchanan & Zimmer,
2012). Consequently the number of studies harvesting information from SNS is growing.
SNS are particularly rich and attractive sources of data because they allow social
scientists to observe large-scale social behaviour at low cost, and without intrusive laboratory
procedures or surveys, which are often retrospective. Consequently, they provide us with data
that represent individual and group behaviour on a previously unseen scale. In this regard,
data from social network sites are often added in with the phenomenon of ‘big data’. ‘Big
data’ is defined by its “capacity to search, aggregate, and cross-reference large data-sets”
(boyd & Crawford, 2012, p. 663). In particular, there is an increasing access to massive
quantities of information about people and their interactions that is generated by growing
sources of digital traces and data deposits such as Twitter, Google, Facebook, Wikipedia etc.
These social interactions, or online imprints, are increasingly aggregated into large databases
and are of immense interest for social scientists, corporations, politicians, journalists, and
governments (Tufekci, 2013). Thus, not just the technical properties but also the volume,
variety and velocity of available information through new information technologies are part
Virtual labs 4
of what is described by ‘big data’ (Davis, 2012). In broader terms, ‘big data’ can be defined
as a cultural, technological and social phenomenon of which data generated via SNS is one
element and the focus of this article (see also boyd & Crawford, 2012 for details). ??
Thus far, most social science research using SNS data has comprised social network
analyses focusing on questions of social influence and mobilisation (see Ackland, 2008; Aral
& Walker, 2012; 2013; Lewis, Kaufman, Gonzales, Wimmers, & Christakis, 2008). These
studies are based on computational approaches (Giles, 2012; Lazer Pentland, Adamic, Aral,
Barabasi, Brewer, Christakis,...van Alstyne, 2009) that borrow from physics and biology.
However, in contrast to cells and atoms, humans are active agents in their social environment
and might protest or oppose when their social life is explored and exposed by social scientists
(Lazer et al., 2009; Tufekci, 2013). This article focuses on ethical implications that arise
when social scientist use of SNS data and perceive these online settings as a massive field
This online field research risks blurring the boundaries between users and individuals
who unwittingly become ‘human subjects’ or participants2. This is important because data
collected from human subjects unavoidably raises privacy and agency issues. When
researchers are engaged in human subject research in the digital world, they are required to
follow ethical guidelines that were developed to protect the interests of participants involved
in experiments in the physical world (see for example Baumrind, 1964; Warwick, 1973).
However, with the increase of data from SNS and the emerging field of computational social
science, little is known about the potential ethical implications of this research and the
translation of human subject research guidelines from an offline to an online environment.
New challenges and questions arise when we deal with data from SNS; for example, who is
responsible for making sure individuals and communities are not hurt by the research? What
does it mean for SNS users to provide behavioral data without their knowledge and to what
Virtual labs 5
extent are their rights violated? On a practical level, what does consent (if needed) look like
when using data from SNS?
As the challenges and implications of using population-scale networked data
are diverse and would go beyond the scope of this paper (see for example Hayden, 2012;
Shapiro & Ossorio, 2013; Waskul & Douglass, 1996), I want to focus on and address one
important ethical question raised by data collection in SNS settings: what is the appropriate
role of informed consent in this new era of large-scale online studies? And further, how can
we develop a ‘best practice’ for researchers in obtaining consent from SNS users?
Online field experiments: a current example
Large-scale online studies seem to be an attractive research method as they provide us
with large amounts of data and give us access to relatively uncontaminated and ‘real’
observed behaviour (Aral & Walker, 2012; but see Tufekci, 2013). In particular, online
experiments go beyond observational research (e.g., Kossinet & Watts, 2006) by making use
of experimental designs that manipulate or intervene in the online environment. The benefits
of these methods are exemplified by a 61-million-person experiment investigating social
influence published recently in Nature (Bond, Fariss, Jones, Kramer, Marlow, Settle, &
Fowler, 2012). Data were collected from all US-based users over 18 years of age who
accessed the Facebook website on 2 November 2010 (the day of the midterm elections).
Users in this experiment were randomly assigned to one of three conditions. Those in the
social message group (n=60,055,176) were shown a counter indicating how many other users
had already clicked an ‘I voted’ button. They were also shown up to six small profile pictures
of ‘Facebook friends’ who had already indicated that they voted. Another message
encouraged these users to vote and provided a link to find the nearest poll station, as well as
an ‘I voted’ button. Those in the informational message group (n=611,044) were provided
with the same information as the social message group but were not shown the faces of their
Virtual labs 6
friends. Those in the control group (n=613,096) received no information about the election or
votes on their newsfeed. The researchers assessed whether users clicked on the voting button
and the link to the polling station. In addition, the researchers assessed actual voting
behaviour, and matched the Facebook data with 6.3 million publicly available voter records.
The goal was to have validated voting records to tie with the data from Facebook to assess
relationship between variables across data sets. It is important to note that the researchers
merged group-level data but not individual-level data to protect the privacy of participants.
This was done via a complex mathematical matching procedure called ‘Yahtzee’ (for details
see Jones, Bond, Fariss, Settle, Kramer, Marlow, & Fowler, 2013). The Yahtzee procedure
ensures that user information cannot be decoded given knowledge of individuals in a specific
dataset. This procedure was developed despite the fact that individual-level matching would
have been possible and allowed in the scope of the study’s Institutional Review Boards
Effects of the message on the dependent variables were statistically small but
significant. Participants who received the social message were 2.08% more likely to click the
‘I voted’ button compared to the informational message group (the control group was not
included in this analysis as they were not provided with an ‘I voted’ button) and 0.26% more
likely to click the link to the polling station information. In addition, those participants who
saw the social message were 0.39% more likely to vote compared to the control group and
the informational message group and the authors were able to validate 282,000 additional
votes cast by people receiving the social message. Thus, even though the effect size of the
results were small, the absolute differences in terms of votes could be large enough to change
an election, which is sometimes decided by a fraction of a percentage point.
These results are interesting in their own right, in line with earlier research showing
that signalling social desirable behaviour to others increases voting behaviour (e.g.,
Virtual labs 7
Greenwald, Carnot, Beach, & Young, 1987), and important to consider in terms of social
influence on political mobilization and actual voting behaviour. However, it is worthwhile to
consider how the researchers in this study dealt with the question of informed consent as it
can be informative for understanding the challenges of following ethics regulations in SNS
Bond et al. did not obtain any informed consent from their participants and did not
debrief participants or give them the chance to withdraw their data from the analysis. The
authors report in the supplementary information adjunct to the Nature paper that they
received ethical approval from the University of California San Diego Human Protections
Program, and that informed consent was waived. The original ethics application, provided by
one author of the article (J. Fowler, personal communication, February 20, 2013), states that
informed consent was waived for three reasons: a) the research is non-intrusive and bears
minimal risk to participants b) the research is not affecting the rights and welfare of
participants and c) the research could not practicably be carried out without the waiver.
Whereas I do not question that the authors of this specific article have thought
extensively about the ramifications of their research, sought approval in their institution and
with their collaboration at Facebook, and devised a new method to match data sets, I would
like to consider this kind of research in a wider context and ask whether it is defensible that
researchers are exempt from asking for informed consent in the context of large-scale online
studies. With this I also would like to highlight that the Bond et al. paper is only one example
of large-scale online research and therefore the backdrop of the current discussion but not the
sole focus (see for example Broockman & Green, 2013; Lewis, Reiley, & Schreiner, 2012).
Informed consent in online research
One of the cornerstones of conducting ethically sound social science research
involves the informed consent of participants, obtained through advising them about the
Virtual labs 8
study in which they are invited to partake, its possible risks - but also benefits, and the
study’s projected outcomes. The use of informed consent is important because it allows
participants to make a choice and signals their willing participation. As researchers we show
respect for the individuals’ autonomy, which is a fundamental ethical principle. Moreover,
the Nuremberg Code (1947) states that informed consent is not only essential for safety,
protection and respect for participants, but also for the integrity of the research itself. Hence,
in order to conduct research in an ethical manner, it is important to think about accountability
– both to our discipline as well as to the people we engage with in research. We are still
obliged to our professional standards when working with people in digital domains, and we
have to make sure that the accessibility of large-scale online data, through the use of
technologically savvy research methods, does not come at the cost of exploiting those
engaged with such domains. In other words, the exciting availability of population-scale
networked data does not excuse us from the responsibility of dealing with questions of ethics
(see boyd & Crawford, 2012).
When discussing the use of large-scale data from SNS, social scientists often provide
two arguments for waiving informed consent. Firstly, it is highlighted that the online
environment (for example messages on the home page like the newsfeed in Facebook) is
constantly altered and changed for marketing and web-development reasons (Broockman &
Green, 2013). Thus, altering the online environment for social science research is similar to
the regular changes in the online environment and does not warrant specific attention in line
with human subject research (see ESOMAR guidelines on social media research, 2011,
especially point 3)3. What is more, users explicitly agree to the Terms and Services of SNS
when they sign up to them, which explicitly define data as public and allow advertisement
Virtual labs 9
Second, it is argued that information that is shared on Facebook or Twitter is already
‘out there’ and therefore publicly available (Zimmer, 2010). Thus, individuals in SNS are
understood as users who socially interact in a metaphorical ‘park’ in which researchers are
free to observe their behaviour. Consequently, the question around ethics and consent when
conducting Internet research is often discussed around the issue of whether we observe
behaviour in a public or a private space (U.S. Office for Human Research Protection, 2010).
Generally, the distinction between private and public behaviour has implications for what
kind of data we are allowed to collect and use without informed consent. For example,
observations in public places or the analysis of publicly available information (texts,
documents, etc.) can be done without consent of participants (Gallup, Hale, Sumpter, Garnier,
Kacelnik, Krebs, & Couzin, 2012; Goldstein, Cialdini, & Griskovicius, 2008; Milgram,
Bickman, & Berkowitz, 1969). Applying this to the online world, researchers may be exempt
from obtaining consent and from informing users/participants about their research when the
data they collect is from publicly available online fora or social media networks such as
Twitter (see for example American Psychology Association Ethics Code (2010); 8.05; 45
C.F.R. § 46.102 (f) 2009; but see Krotoski, 2012)
Consequently, in most studies using SNS the data were perceived as publicly
available and that further changes in the online environment would not harm the
participants/users experience. Therefore the researchers stated that their collection did not
affect user welfare or privacy concerns and further permission for data collection from
participants was not necessary (see Bond et al., 2012; Broockman & Green, 2013; Kaufman,
2008; Lewis et al., 2008). Importantly, this is in line with the Data Use Policy of Facebook
(Facebook, nd), which states that most information is treated as public information. Even
though some types of information can be restricted from public view (for example, profiles
that can only be seen by friends, private wall-to-wall communication, private chats or
Virtual labs 10
emails), they are still available to Facebook and its collaborators for purposes explained in
the Data Use Policy. This clearly states that data can be used for internal operations including
troubleshooting, data analysis, research, and service improvement. Taken together, in legal
terms and by accepting the terms of service, users agree to make their data available and to
partake in research. From this perspective and if we define SNS as a public space, current
ethical guidelines allow us to dispense with informed consent and allow researchers using
SNS data without the explicit consent (or opt-in) of unwitting participants.
Yet, the perception that the data is publicly available and therefore unrestrictedly
useable for researchers might become more complex when we take into account the users’
expectations especially when parts of the researched online environment are restricted from
public view (e.g., such as a Facebook profile pages; newsfeeds etc.). Thus, publicness of the
context (i.e., Facebook, Twitter) does not in itself preclude the emergence of private
interaction or the users’ expectations of privacy; for example, if some form of registration is
required, users are likely to experience the space as private (Martin, 2012; Rooke, 2013).
More specifically, the behaviour users display and the data they produce might have been
created in a context-sensitive space, and it is possible that users would not give permission
for their data to be used elsewhere (e.g., in a publication or being matched with other data-
sets). Even though the data in themselves might be public (or semi-public4), we have to
question whether this simply equates with permission for all uses or whether there are
differences in what users perceive as public and private in different contexts (Waskul &
Douglass, 1996; Whitty, 2004). Therefore researchers should acknowledge that there are
different places and audiences online that have to be taken into account when making ethical
decisions about the use of online information (Griffith & Whitty, 2010). That is, the use of
data that are ‘out there’ in one context (e.g., a Facebook profile or newsfeed shared with
‘friends’ or a specific network) might become problematic when they are ‘out there’ in
Virtual labs 11
another environment (e.g., a data-set used for research). In this sense, Facebook is
transformed from a public space to a behavioural laboratory.
How the use of data that is ‘out there’ becomes problematic and raises question of
vulnerability of privacy and control of online data is exemplified is the T3–project (Tasters,
Ties, and Time; Lewis et al., 2008). This Harvard-based research project gathered
information on 1,700 college-based Facebook users. The researchers wanted to study how
their interests and friendships changed over time (Lewis et al., 2008). Additionally, the data-
set was made publically available for further analysis. However, it was quickly discovered
that it was possible to de-anonymize parts of the dataset, which clearly compromised the
privacy and control of students. Because this data was perceived as ‘purely’ observational
and as involving little risk the participants were not aware that their data was collected and
used for research. The researchers did not obtain informed consent and were not asked to do
so by the Harvard’s IRB (Parry, 2011; Zimmer, 2010). However, after it was recognized that
data could be de-anonymized and were collected without consent, the project was stopped
and the data-set is no longer publically available. The case was widely publicised (Parry,
2011) and Zimmer (2010) argues that the T3 researchers failed to uphold their duty to engage
in ethically-based research by underestimating the sensitivity of data and by failing to
recognise that users might have strong expectations that information that is shared on
Facebook is meant to stay on Facebook.
In addition, Narayanan and Shmatikove (2008, 2009) showed how the use of cross-
referenced large-scale data-sets can be problematic. They argued that anonymity, which is a
prerequisite of privacy and autonomy in research, is rarely sufficiently protected when
dealing with SNS or other large-scale data sets. For example, they demonstrated that
relatively little cross-referencing information (from two data-sources; such as Twitter and
Flickr, or Netflix and IMDb) can be used to de-anonymize data from large-scale data sets,
Virtual labs 12
which comprised the identity and privacy of users and might have implications for their sense
of autonomy and control over their information flow.
These examples highlight that rather than adhere to the logic of data accessibility and
whether this information is publicly available or not, researchers should take into account
users’ expectations about the nature of their social interactions online, how they anticipate
that their data will be used and whether they perceive their social interaction to be private
(i.e., Trinidad, Fullerton, Ludman, Jarvik, Larson, & Burke, 2011). In this sense, we move
from a legal contract that neatly defines private and public and the right to use data to a more
complex, psychological contract that takes into account control perception and expectations.
Thus, the one-size-fits-all privacy policies that make a clear distinction between private and
public information are too short-sighted to deal with the complexity of online social
behaviour and how it is observed. To further elaborate on this point, I focus on how control
and privacy are context-specific concepts and how this thinking influences questions on
Control and privacy in context
The key principles of ethics, namely control and autonomy of participants, are directly
linked to the notion of privacy. Privacy is defined as a means to reach the ability to determine
how we engage with and control a specific context (e.g., the offline or online world; Stadler,
2011; see also Marturano, 2011). This linkage is based upon the thinking of Westin who
emphasised the element of control and states that privacy is “the claim of individuals, groups
and institutions to determine for themselves when, how, and to what extent information about
them is communicated with others” (Westin, 1967, p. 7). In this sense, privacy is about
control over information and assumes that autonomous agents should be able to control
information in a meaningful way.
Virtual labs 13
Individual expectations about privacy in online environments are ambiguous,
contested and changing (Acquisti & Gross, 2006; Hugl, 2011). Individuals who communicate
on SNS may de facto operate in a public sphere but may maintain a strong perception or
expectation of privacy that governs their behaviour because they communicate with friends or
close others (Markham & Buchanan, 2012). Whether information and online social behaviour
is perceived as public or private might be highly subjective and depends on the user’s
perspective; thus there is a psychological difference between being in public and being
public, which is seldom acknowledged by social scientists using SNS data (boyd &
Crawford, 2012, p. 673) SNS users who share information with their online friends on
password protected SNS may expect that their information is only shared with this specific
online community and not others, who are not part of their circle of friends (Martin, 2012;
Rooke, 2013). Thus, they might have a specific expectation on how and with whom
information is transmitted, which might not necessarily align with the SNS’ Terms of Service
and the ‘reality’ of data usage.
To account for the relative nature of privacy and consent, it is useful to consider
Nissenbaum’s (2004, 2010, 2011) concept of contextual integrity. Contextual integrity refers
to a theory of privacy in which the protection of personal information is linked to the norms
of information flow in a specific context. Contextual integrity ties adequate protection for
privacy to norms in a specific context. It demands that information collecting and its
dissemination should be appropriate to the context (Nissenbaum, 2004). For example, in a
health-care context, patients expect to share personal information on their health and they
most likely accept that this information is shared with a specialist. Their expectations are
violated, however, if they learn that the information is sold to a marketing company
(Nissenbaum, 2011). Thus, Nissenbaum proposes that the flow of private information from
one agent to another is context-dependent and that informational norms prescribe which
Virtual labs 14
patterns of information flow are acceptable in one context but not in another one (see also
Zimmer, 2008). Thus, communication can be disrupted because of shifting recipients, types
of information and constraints under which information flows (Nissenbaum, 2011).
When communicating on Facebook with known and familiar parties (our friends and
acquaintances) we might have expectations of how information flows (sharing a picture with
a friend). However, this norm could be violated when our friend posts this picture on her
homepage and third parties get access. In such a situation a third party that was not primarily
part of the interaction becomes an actor and influences the norms adherent to the
The idea of contextual integrity challenges the perception of the traditional
dichotomies of public versus private spaces or between public and private information. A key
point in the theory is the idea that online information-sharing takes place in a “plurality of
distinct realms” (Nissenbaum, 2004, p. 137) and that norms of appropriateness and
transmission principles guide what is suitable in a given context. The first norm determines
whether a given type of information is either appropriate or inappropriate to disclose in a
context. The second one restricts the flow of information within and between contexts. When
either of these norms is breached, a violation of privacy occurs (Grodzinsky & Tavani, 2010;
Rather than prescribing universal rules for what is public and what is private,
contextual integrity builds from within the normative bounds of a given context and
illustrates why we must attend to the context in information flows and its use -- not the nature
of the information itself -- when thinking about research ethics (Grodzinsky & Tavani, 2010;
Zimmer, 2008). Thus, when thinking about the Bond et al. study, we should not have
assumed that the shared content (e.g., the information on voting behaviour, actual voter
records) was publicly available and that Facebook users automatically gave consent for data
Virtual labs 15
use. On the contrary, SNS users might have perceived that they only shared information with
their friends and they might have been especially uncomfortable with the fact that their data
was used by researchers and matched with voter records. Thus, it is this difference in how the
information flow is perceived by researchers (as publicly available) and by users (as private)
that creates the ethical tension and which should be taken into account when we make ethical
decisions on the use of SNS data.
Contextual integrity, informed consent and online experiments
In SNS, privacy policies are unilaterally set by the companies owning the site, which
gives them the right to gather, use and distribute personal information that users generate
(Nissenbaum, 2010). This means that users have little control or agency over the flow of
information and they have only partial knowledge on when/how/where their information is
used. Users often are aware of this and privacy expectations vary between different
technologies and features of these technologies (with lower privacy expectations when
posting on Facebook compared to email exchanges; Martin, 2012). However, users privacy
settings often do not match their expectation (Lui, Gummandi, Krishnamurty, & Mislove,
2011) and people who share and use information displayed on the Facebook’s newsfeed and
their profile page (as in the case of the Bond et al., 2012 study), might expect to act in one
specific (relatively private) context (e.g., sharing information with known people) but their
data are mainly public and available for research. Thus, when social scientists use these data
in specific context for research without the knowledge or consent (e.g., their data remark is
recorded, matched with other sources and published), users might experience this particular
transmission of information as a transgression of context-relative informational norms
Hence, when tying this back to the Bond et al (2012) study, I argue that the reasons
for waiving informed consent might be seen differently under the scrutiny of contextual
Virtual labs 16
integrity. For example, one reason for waiving informed consent was that alterations in the
newsfeed would not affect the rights and welfare of the subjects. Two things suggest that it
might have been otherwise. For one, individuals were presented with and shared information
on their newsfeed, which is only public if the users set their privacy settings accordingly.
Thus, by using these data without permission, participants could have felt that their right to
withhold information for research purposes and therefore their right for privacy and
autonomy might have been violated. Furthermore, and more importantly, the study explored
voting behaviour and merged data from two different sources. Even though the researchers
went into great length to de-individualise the matched data-sets, the users’ norms of
transmission and appropriateness when their Facebook data was matched with public vote
records might have been violated (on the problems of using data in different context see for
example Drabiak-Syed, 2010; Hayden, 2012). Whereas users or participants expected to
share information with their known social circle (i.e., similar to telling a friend who I voted
for sitting on a park bench), the flow of information was changed in the way that this
information was changed (‘friends’ were informed about who voted), recorded, and merged
with other data (the voter records) and finally widely published (without consent).
Thus, in the ‘faceless’ context of online experiment, the users became ‘human
subjects’ and Facebook an experimental field. Whether this transgression is morally or ethical
problematic and caused harm is a matter of debate. One may say that this is the nature of SNS
technologies that this information is available and SNS users who are surprised are naïve.
Further, the scientific knowledge that is the basis of social progress might permit the
relatively minor risk of violating privacy and autonomy rights and that the benefits of the
Bond study (e.g., increasing voter turn-out) outweigh the costs for participating individuals5.
Yet, I argue that changing and using information on the newsfeed or personal profiles
for research purpose that is geared towards behavioural change (e.g., voting behaviour)
Virtual labs 17
impacted on the autonomy and freedom of participants, which is potentially troublesome
because it harms the perception of control and autonomy and threatens the trust between the
community of social scientists and participants (Hayden, 2012; Parry, 2011).
Such a discussion on norms of appropriateness is a longstanding issue in social
science and not easily solved. It is, for example, found in some of the response to the so-
called Tearoom Trade study by Laud Humphreys (1970). Humphreys studied the social
background of homosexual men who engaged in homosexual behaviour in public facilities.
For his study he observed men entering a public bathroom to engage in sexual activities. He
recorded licence plate numbers of cars driven by these men. One year later, he visited the
homes of the men and interviewed them as part of a larger study on social issues. He decided
that informed consent would be impossible to obtain and recorded and published results
without the consent. The study and its ethical implications triggered a debate on the aims of
social science and the costs and benefits of covered research. Warwick (1973; but see Lenza,
2004) used Humphreys study to illustrated how ethical decisions in research have to be seen
in the context of the right to know (i.e., interest of the researchers and larger society) and the
right to privacy (i.e., for the participants). He further argues that privacy is related to personal
freedom, which involves that individuals have the freedom to not reveal and discuss certain
beliefs and behaviours with everyone. He states that “to the extent that social scientists
engage in covert observation, however noble in cause, this freedom will be reduced”
(Warwick, 1973, p. 35).
Hence, the control of information and what is done with it seems crucial for the
management of privacy and autonomy concerns and the ethical handling of research in SNS
and has to be discussed in light of the overall values of the context. Like with all research,
guidelines for experiments in SNS need to balance the interest of the society (e.g., is the
research worth doing; do we gain knowledge; are the benefits outweighing the risks), the
Virtual labs 18
prospective participants (e.g., of consent, privacy, understanding the Terms of Service), and
potentially the interest of researchers (e.g., understanding a phenomenon, relatively low costs,
high impact publications). Yet the ethical standards of large-scale online experiments as well
as other forms of ‘big data’ remain ambiguous (Shapiro & Ossorio, 2013; Buchanan &
Zimmer, 2012), and in the U.S. and Europe academic researchers and institutional review
board (IRBs) have received no guidance from the U.S. Office for Human Research Protection
(OHRP) or other official bodies (such as the Economic and Social Science Research Council
in the U.K., or the German Science Foundation) on how to apply human subject regulations
to SNS research. In the near future and with the rapid development of research online, we
will need better global and multidisciplinary guidelines to ensure common practise in online
research that go beyond the question of private and public information.
However, even in the absence of general guidelines, individual researchers have to
think through the ‘relations between principles and particularities’ (McNamee, 2001, p. 314
in Whiteman, 2012; Whiteman, 2010) that should involve examining embedded, contingent
and context-sensitive research ethics. As the concept of contextual integrity is useful to
rethink the distinction of private and public domain, the analysis of integrity is a benchmark
to understand under which conditions online field experiments violate privacy concerns and
how to consider this in the ethics application process. Thus, one possible solution when
making decisions on ethics is to go beyond simplistic perception of public/ private and
observational/non-observational and include decision heuristics that are based on contextual
integrity (see Nissenbaum, 2010, p 182).
Thinking about different ways of asking for consent
If online experiments using SNS data constitute human subject research and transform
SNS users into participants, we should more proactively respect their sense of privacy and
autonomy. Furthermore, consent should be understood as a process and not a one-off event
Virtual labs 19
and therefore requires renegotiation over time and is not covered by accepting the Terms of
Service when signing up to a SNS; also because users often fail to read Terms of Services
and blindly accept the terms and conditions (Böhme & Köpsell, 2010). To this end, I propose
that ethical online research requires consideration of the contextual integrity of research and
some degree of informed consent or debriefing. Thus, for a responsible handling of
networked data-set research, we need to consider ways of implementing consent and/ or the
possibilities to withdraw information.
However, when thinking about the practicalities of conducting large-scale studies
with millions of users, the challenges of asking for informed consent are manifold. Probably
the most challenging in terms of dealing with ‘big data’ in social science research is the fact
that is seems difficult to acquire informed consent from millions of users.
To understand how we can get around the problem of violating the privacy and trust
of users who become human subjects, and at the same time acknowledging the practical
implications of asking for informed consent as an opt-in tool (Wiles, Heath, Crow, & Charles,
2005), we could turn to other disciplines, research traditions or current initiatives. These
might be helpful to creatively think about new ways of implementing consent although none
of them are without their own problems and limitations. Thus, the following short- and long-
term solutions that deal with the implementation of informed consent or debriefing when
conducting large-scale online experiments are not a complete or sufficient list of solutions but
are presented to spark a discussion on how we practically deal with new challenges arising.
For example, Aral and Walker (2012; see also Lui et al., 2011) used an application
request when conducting a randomised experiment in Facebook. Participation in this study
was only possible via the use of an add-on application to the regular Facebook function.
Thus, when users installed the application they had to give opt-in permission through an
authentication dialog window. In this way, users extended the consent they gave to the Terms
Virtual labs 20
of Services in Facebook for the use of their data via the application. One the one hand, the
consent through a new application can be useful for researchers to renegotiate what users
consent to and what they chose to withhold. However, by being embedded in an application
that users have to install and therefore being interwoven with Facebook’s commercial
interests, such an approach might not be appropriate for all kind of research. Moreover, it will
make the use of ‘deception’ or unobtrusive observation more difficult.
Another possibility could be based on what we know from ethnographic studies.
Ethnographic researchers routinely do not provide information to all study participants before
the data are collected (Mulhall, 2003; Punch, 1998). However, researchers provide debriefing
information that includes information about the nature of the study, its use and dissemination
after the data were collected and individuals are given the option of withdrawing from the
research and to opt-out. Although this practise of ‘opting-out’ is not itself free from criticism
(e.g., Hayden, 2012; Speer & Stokoe, in press), it seems technically and practically more
suitable for the context of ethical research in SNS and might be a pragmatic approach for an
ethical compromise for the use of experimental designs in SNS. For example, Facebook has
ways of communicating with all their users and could inform and debrief users upon the
completion of data gathering. Similar to the opt-in/opt-out permission that is granted for
applications, users could decide whether they want to withdraw their data from the particular
research under question. Although this might reduce the number of participants and thus the
representativeness of the data (and data points) it might be one way of ensuring that users
become research participants by their own will.
More challenging – and not immediately applicable but in the long run maybe more
suitable for conducting ethical research- is to rethink data management policies of SNS.
Rather than assuming that existing Terms of Service are covering the use of all data and
provide consent to partake in any research, SNS providers that engage in social science
Virtual labs 21
research may want to enhance the user’s involvement in data generation and use. One
example of such endeavour is proposed by Aïmeur, Gambs, and Ho (2010) and is defined as
a Privacy-enhanced Social Network Site (PSNS) that follows the properties of privacy
awareness and customization, data minimizations and sovereignty. These systems can include
different levels of privacy that determine how much information a user would like to leave to
the provider and people collaborating with the provider, such as researchers. As an example,
Aïmeur et al. (2010, p.176) explain that “in the situation where the user chooses Full Privacy,
the SNS server is only trusted in storing an encrypted version of the personal information of
the user so that it can consult at any time by one of his friends but not to the point where the
SNS itself has access to this information (as the SNS server does not know the keys needed to
decrypt this information). As the purpose of Privacy Watch is to protect the user privacy, the
privacy level is set by default to Full Privacy”. Thus, when a user chooses Full Privacy
settings, she would not expect her data to be used for research. However, as part of such a
system, privacy awareness and customization could likewise allow researchers to conduct
experiments when users have set their privacy settings accordingly, for example as No
Privacy or Soft Privacy.
Another long-term solution that is currently discussed for medical science is based on
the concept of ‘radical honesty’. An example of how this could work is provided by a project
called Consent to Research (http://weconsent.us/sharing-my-genotype/), an organisation that
aims to collect large-scale genomic and health data. This organisation promotes the idea of
‘Portable Legal Consent’. The ‘Portable Legal Consent’ describes a rigorous consent process
which users and participants have to go through (and not just click ‘I accept’) before they can
upload information about themselves, such as genetic-analysis results, lab tests, or even their
full medical record. This portable legal consent aims at disentangling consent from privacy.
Portable Legal Consent “(…) embeds the idea that we should disclose risks about research
Virtual labs 22
inside the idea that data should be something that can be remixed, to allow unexpected
discoveries to emerge from the combination of earlier studies by later scientists. If a person
completes the Portable Legal Consent process, she will have an informed consent that travels
with her from one upload of data into an environment that allows many studies – that is
portable from a research perspective, and that she controls” (Consent to Research, nd).
Therefore, SNS users should be reminded of the use of data for research and the fact that data
created on SNS can be mixed with other sources for new discoveries. The Consent to
Research initiative argues for a proactive relationship between data donors and those who
work with their data (i.e., SNS users and researchers). In practise researchers create the data-
set with the users and develop a data commons (similar to other commons such as Flickr or
the different Wikis). In this way, the data are not ‘owned’ by a specific person or entity but
become a public good. Moreover, all data are shared voluntarily and free to use for
researchers and their analysis (Wilbanks, 2012). It makes the data widely available to
researchers under broad guidelines, but also requires participants to go through rigorous
consent processes and demand honesty and trust from both researchers as well as participants
or ‘data donors’. In this way, the participants become knowing and autonomous decision-
makers in the process of giving personal data.
To create such a tool for social science embedding social media data might be one
future way of making large-scale online data available for analysis without compromising
ethical guidelines and good research practise. In this way, SNS users would give permission
for their various data-sets (Twitter, Facebook, Flickr etc. accounts) to be added to the
commons and to be made available for use, which would allow a wide range of social
scientists to access SNS data without companies as gatekeepers (see Footnote 4) and with
clear permission of participants.
Virtual labs 23
Yet, all presented solutions have in common that consent for the wide use of data by
social scientists and others might limit the sample population to those who actively gave
permission. This inevitably will increase a sampling error via non-responses and a self-
selection bias. However, sampling biases can be tackled with the appropriate estimation
strategy and research design (e.g. Hill & Shaw, 2013; Massey & Tourangeau, 2013).
Moreover, we should be aware that large data sets from SNS are prone to gaps and errors and
as such they do not represent random and representative samples. For example, Twitter is
only used by 10% of the US population; Facebook has a wider sample but its use are
structured by race, class, gender, educational level and other factors and is itself not a
representative sample (Tufekci, 2013). Thus, bigger data is not necessarily better data (boyd
& Crawford, 2012; Tufekci, 2013).
The purpose of this article was to highlight some of the emerging challenges of
engaging in online utilising social network sides. In the age of ‘big data’ hopes are high for
getting new insights by observing social interactions between millions of people (Bond et al.,
2012; Kramer, 2013). However, it is our responsibility as scholars to ensure our research
methods and processes remain rooted in long-standing ethical practices. Issues of consent,
privacy and anonymity do not disappear simply because individuals participate in online
social networks and our data-set is large; rather, they become even more pressing and have to
be discussed in their context (Zimmer, 2010).
In this article I focused on a particular ethical challenge when conducting online
experiments in social networks and I argued that informed consent is vital for conducting
large-scale experiments to protect the privacy, autonomy, and control of users and ultimately
our participants. Lack of ethical research can hinder academic progress, our regard as a
community, and our trustworthiness. We ultimately need an earnest, innovative and creative
Virtual labs 24
discussion in the field on how to implement ethical guidelines that first and foremost protect
participants but also allow researchers to conduct sound research. I propose that we start to
reconsider the conceptions of risk, benefit and harm to potential participants (e.g. SNS users)
and treat participants as stakeholders in research and not as passive objects we observe
(Trinidad et al., 2011). Researchers, IRBs, and funders must reconsider current approaches to
consent to live up to the challenges provided by large-scale online experiments. Shapiro and
Ossorio (2012) warned that the private sector is charging ahead and creates de facto standards
for data use that provide broad -- I would argue overly broad -- access to personal information
and behaviour. As a field we should make sure that our work has social value that goes
beyond selling products and that we are on the front line of setting standards for accessing
and working with people’s online information that are in line with our ethical consciousness
and research practise.
Virtual labs 25
Ackland, R. (2008, June). Using Facebook as a data source and platform for e-researching
social networks. In refereed paper presented at the Fourth International Conference
on e-Social Science (pp. 18-20).
Acquisti, A., & Gross, R. (2006). Imagined communities: Awareness, information sharing,
and privacy on the Facebook. In Privacy enhancing technologies (pp. 36-58). Berlin
Aïmeur, E., Gambs, S., & Ho, A. (2010, February). Towards a privacy-enhanced social
networking site. In Availability, Reliability, and Security, 2010. ARES'10
International Conference on (pp. 172-179). IEEE.
Aral, S., & Walker, D. (2012). Identifying influential and susceptible members of social
networks. Science, 337, 337-341.
Aral, S., & Walker, D. (2013). Tie strength, embeddedness and social influence: Evidence
from a large-scale networked experiment. (January 8, 2013). Retrieved from SSRN:
American Psychological Association. (2010). American Psychological Association ethical
principles of psychologists and code of conduct. Retrieved April 17th, 2013, from
Baumrind, D. (1964). Some thoughts on ethics of research: After reading Milgram’s
‘Behavioural study on obedience’. American Psychologist, 19, 421-423.
Böhme, R., & Köpsell, S. (2010, April). Trained to accept? A field experiment on consent
dialogs. In Proceedings of the SIGCHI Conference on Human Factors in Computing
Systems (pp. 2403-2406). ACM.
Bond, R. M., Fariss, C. J., Jones, J. J., Kramer, A. D., Marlow, C., Settle, J. E., & Fowler, J.
Virtual labs 26
H. (2012). A 61-million-person experiment in social influence and political
mobilization. Nature, 489, 295-298.
boyd, D., & Crawford, K. (2012). Critical questions for big data: Provocations for a cultural,
technological, and scholarly phenomenon. Information, Communication & Society,
Broockman, D.E., & Green, D.P. (2013). Do online advertisements increase political
candidates’ name recognition or favourability? Evidence from randomised field
experiments. Political Behaviour, online first. Retrieved 23rd July 2013 from
Buchanan, E. & Zimmer, M. (2012). Internet research ethics. In Stanford Encyclopaedia of
Philosophy, Retrieved from http://plato.stanford.edu/entries/ethics-internet-research.
Consent to Research (nd). Data donation FAQ. Why is it called Portal Legal Consent?
Retrieved 16th April 2013 from Consent to Research http://weconsent.us/donate-your-
Davis, K. (2012). Ethics of big data. Sebastopol: O’Reilly Media.
Drabiak-Syed, K. (2010). Lessons from Havasupai Tribe v. Arizona State University Board
of Regents: Recognizing group, cultural, and dignitary harms as legitimate risks
warranting integration into research practice. Journal of Health & Biomedical Law,
European Society for Opinion and Market Research (2011). ESOMAR Guidelines on social
media research. Retrieved 15th July 2013 from
Facebook (nd). Data Use Policy. Retrieved 29th April 2013 from
Virtual labs 27
Gallup, A. C., Hale, J. J., Sumpter, D. J., Garnier, S., Kacelnik, A., Krebs, J. R., & Couzin, I.
D. (2012). Visual attention and the acquisition of information in human crowds.
Proceedings of the National Academy of Sciences, 109, 7245-7250.
Giles, J. (2012). Computational social science: Making the links. Nature, 488, 448-450.
Goldstein, N. J., Cialdini, R. B., & Griskevicius, V. (2008). A room with a viewpoint: Using
social norms to motivate environmental conservation in hotels. Journal of Consumer
Research, 35, 472-482.
Greenwald, A.G., Carnot, C.G., Beach, R., & Young, B. (1987). Increasing voting behaviour
by asking people if they expect to vote. Journal of Applied Psychology, 72, 315-318.
Griffiths, M., & Whitty, M.T. (2010). Online behavioural tracking in Internet gambling
research: Ethical and methodological issues. International Journal of Internet
Research Ethics, 3, 104-117.
Grodzinsky, F. S., & Tavani, H. T. (2008). Applying the 'contextual integrity’? Model of
privacy to personal blogs in the blogosphere. International Journal of Internet
Research Ethics, 3, 38-47.
Hayden, E.C. (2012). Informed consent: A broken contract. Nature, 486, 312-314.
Hill, B.M., Shaw, A. (2013). The Wikipedia gender gap revisited. Characterizing survey
response bias with propensity score estimation. PlosOne, 8, e65782.
Hugl, U. (2011). Reviewing person's value of privacy of online social
networking. Internet Research, 21, 384-407.
Jones, J. J., Bond, R. M., Fariss, C. J., Settle, J. E., Kramer, A. D., Marlow, C., & Fowler, J.
H. (2013). Yahtzee: An anonymized group level matching procedure. PloS one,
Kaufman, J. (2008). I am the Principle Investigator… [Blog comment]. On the ‘Anonymity
Virtual labs 28
of the Facebook dataset. Retrieved 29th April 2013
Kramer, A.D.I. (2013, January). The internet is one massive field study. In Croiser, B.S.
(Chair), Harvesting and distilling big data in the information age: Applications and
advances in social and personality psychology. Symposium conducted at the meeting
of the Society for Personality and Social Psychology Annual Conference. New
Krotoski, A. (2012). Data-driven research: open data opportunities for growing knowledge,
and ethical issues that arise. Insights, 25, 28-32.
Krotoski, A. (2013). Untangling the web. What the internet is doing to you.
London: Faber and Faber.
Lazer D., Pentland, A., Adamic, L., Aral, S., Barabasi, A.-L., Brewer, D., Christakis, N.,
Contractor, N., Fowler, J., Gutmann, M., Jebara, T., King, G., Macy, M., Roy, D., &
van Alstyne, M. (2009). Computational social science. Science, 323, 721–723.
Lenza, M. (2004). Controversies surrounding Laud Humphreys’ Tearoom trades? An
unsettling example of politics and power in methodological critiques. International
Journal of Sociology and Social Policy, 24, 20-31.
Lewis, K., Kaufman, J., Gonzalez, M., Wimmer, A., & Christakis, N. (2008). Tastes, ties, and
time: A new social network dataset using Facebook.com. Social Networks, 30, 330-
Lewis, R.A., Reiley, D.H., & Schreiner, T.A. (2012). Ad attributes and attribution: Large-
scale field experiments measure online consumer acquisitions. Working Paper,
Liu, Y., Gummadi, K. P., Krishnamurthy, B., & Mislove, A. (2011, November). Analyzing
Virtual labs 29
Facebook privacy settings: User expectations vs. reality. In Proceedings of the 2011
ACM SIGCOMM conference on Internet measurement conference (pp. 61-70). ACM.
Markham, A. & Buchanan, E. (2012). Ethical decision-making and Internet research 2.0:
Recommendations from the Association of Internet Researchers Ethics working
committee. Retrieved from http://aoir.org/reports/ethics2.pdf.
Martin, K. (2012). Informationa techonology and privacy: conceptual muddles or privacy
vacuums? Ethics and Information Technology, 14, 267-284.
Marturano, A. (2011). The Ethics of Online Social Networks–An Introduction. International
Review of Information Ethics, 16, 3-5.
Massey, D. S., & Tourangeau, R. (2013). Introduction new challenges to social
measurement. The Annals of the American Academy of Political and Social Science,
Mayer-Schonberger, V. & Cukier, K. (2013). Big Data: A revolution that will transform how
we live, work and think. London: John Murray.
Milgram, S., Bickman, L., & Berkowitz, L. (1969). Note on the drawing power of crowds of
different size. Journal of Personality and Social Psychology, 13, 79-82.
Mulhall, A. (2003). In the field: notes on observation in qualitative research.
Journal of Advanced Nursing, 41, 306-313.
Narayanan, A. & Shmatikoy, V. (2009). De-anonymizing Social Networks
IEEE Security and Privacy Conference, Retrieved from
Narayanan, A. & Shmatikoy, V. (2009). Robust De-anonymization of Large Sparse Datasets
(How To Break Anonymity of the Netflix Prize Dataset). IEEE Security and Privacy
Virtual labs 30
Nissenbaum, H. (2004). Privacy as contextual integrity. Washington Law Review, 79, 101-
Nissenbaum, H. (2010). Privacy in Context. Technology, policy, and the integrity of social
life. Stanford (CA): Stanford University Press.
Nissenbaum, H. (2011). A contextual approach to privacy online. Daedalus, 140, 32-48.
Nuremberg Code (1947). Permissible medical experiments. In British Medical Journal,
(1996), 313, 1448.1.
Parry, M. (2011, July 10th). Harvard researchers accused of breaching students privacy. The
Chronicle of Higher Education. Retrieved from http://chronicle.com/article/Harvards-
Punch, M. (1998) Politics and ethics in qualitative research. In Denzin, N. & Lincoln, Y.
(eds) The Landscape of Qualitative Research. Sage: London.
Rooke, B. (2013). Four pillars of internet research ethics with Web 2.0. Journal of Academic
Ethics, Published online July 14th 2013. Retrieved from
Shapiro., R.B. & Ossorio, P.N. (2013). Regulation of online social network studies. Science,
Speer, S. A., & Stokoe, E. (in press). Ethics in action: Consent-‐gaining interactions and
implications for research practice. British Journal of Social Psychology.
Stalder, F. (2011). Autonomy beyond privacy: A rejoinder to Bennett. Surveillance & Society
Trinidad, S.B., Fullerton, S.M., Ludman, E.J., Jarvik, G.P., Larson, E.B., & Burke, W.
(2011). Research practice and participant preferences: The growing gulf. Science,
Tufekci, Z. (2013). Big Data: Pitfalls, methods and concepts for an emergent field (March 7
Virtual labs 31
, 2013). Available at SSRN: http://ssrn.com/abstract=2229952 or
U.S. Office for Human Research Protection (2010). Minutes Secretary’s Advisory Committee
on Human Research Protections July 20-21, 2010 – Arlington, VA. Retrieved from
Waskul, D., & Douglass, M. (1996). Considering the electronic participant: Some polemical
observations on the ethics of on-line research. The Information Society: An
International Journal, 12, 129-140.
Warwick, D.P. (1973). Tearoom trades: Means & ends in social research. The Hastings
Center Studies, 1, 27-38.
Watts, D. J. (2007). A twenty-first century science. Nature, 445, 489-489.
Westin, A. (1967). Privacy and freedom. New York: Atheneum.
Whiteman, N. (2010). Control and Contingency: Maintaining ethical stances in research.
International Journal of Internet Research Ethics, 3, 6-23.
Whiteman, N. (2012). Undoing Ethics. Rethinking practice in online research. London (UK):
Whitty, M.T. (2004). Cyber-flirting: An examination of men's and women's flirting behaviour
both offline and on the Internet. Behaviour Change, 21, 115-126.
Wiles, R., Heath, S., Crow, G., & Charles, V. (2005). Informed consent in social research: A
literature review. Southampton: ESRC National Centre for Research Methods.
Wilbanks, J. (2012, June). John Wilbanks: Let’s pool our medical data. [Video file].
Virtual labs 32
Zimmer, M. (2008). Privacy on Planet Google: Using the theory of contextual integrity to
clarify the privacy threats of Google's Quest for the perfect search engine. Journal of
Business and Technology Law, 3, 109-126.
Zimmer, M. (2010). “But the data is already public”: on the ethics of research in Facebook.
Ethics and Information Technology, 12, 313-325.
Virtual labs 33
1The social behaviour and interaction displayed in Social Network Sides is bound to
the design and algorithms underlying the platform, thus the specific configurations of the
system shape the nature of interaction (Nissenbaum, 2010). Therefore, what is ‘real’
behaviour has to be understood in this specifically designed context.
2Note the use of the word participants rather than subject. A subject is a person who is
experimented on by others, participant means to take part or to be part with others.
3One problem in likening social marketing practise with social science research is that
members of the respective communities (social media marketing vs. social science) are
accountable to different code of conducts. Social media marketing practitioners should adhere
to ethical guidelines set by ESOMAR. Generally though, market and public relations
practises are not subject to ethic boards or IRB’s like social science research.
4Big data, especially those from SNS, are often not publicly available and access to
these data for research purposes is limited to either those people who work for or with the
social media companies who own the data or those who pay for the data. As well as creating
inequality between people with vs. without access (boyd & Crawford, 2012), the limitations
to access make it hard to replicate or verify research, which in itself might be an ethical
5Even though the statistical effect of the manipulation in Bond’s study was small,
their intervention might have had the potential to change the outcome of the Congressional
elections in 2010. One could argue that in a democracy increasing voter turn-out is a positive
and desirable research outcome. Thus, how we increase voter turn-out and change voting
behaviour is an important research area in political science. Without any doubt, the present
study collected data and influenced behaviour with good intentions and the results are
interesting for political scientists and strategists alike (see http://www.civisanalytics.com).
Virtual labs 34 Download full-text
However, these techniques could be used to influence political protest or anti-democratic
behaviour or in countries with little democratic traditions, which could be more problematic.
This demonstrates that what we define as the ‘right’ causes and whose ‘right’ causes we
investigate and influence is another ethical challenge that we face when we unleash the power
of ‘big data’.