ArticlePDF Available

Platform Enclosure of Human Behavior and its Measurement: Using Behavioral Trace Data against Platform Episteme

Abstract and Figures

Digital trace data from giant platforms are gaining ground in academic inquiries into human behavior. This trend accompanies contestations regarding representativeness, privacy, access, and commercial origin. Complementing existing discussions and focusing on knowledge production, we draw attention to the different measurement regimes within passively captured behavioral logs from industries. Taking an institutional perspective on measurement as a management technology, we compare platforms with third party audience measurement firms. Whereas the latter measure to provide "currency" for a multi-sided advertising market, the former measure internally for their own administrative purposes (i.e., prescribing behavior through design). We demonstrate the platform giants' twofold enclosure of first the user ecology and subsequently the previously open market for user attention. With platform trace data serving as a lifeline for scholarly research, platform episteme extends itself to enclose knowledge production. We conclude by suggesting ways in which academic quantitative social sciences may resist these platform enclosures.
Content may be subject to copyright.
Prepublication Copy
Wu, A. X., & Taneja, H. (Forthcoming). Platform Enclosure of Human Behavior and its Measurement:
Using Behavioral Trace Data against Platform Episteme.
New Media & Society.
Angela Xiao Wu
Media, Culture, and Communication
New York University
Harsh Taneja
University of Illinois at Urbana-Champaign
Digital trace data from giant platforms are gaining ground in academic inquiries into human
behavior. This trend accompanies contestations regarding representativeness, privacy, access, and
commercial origin. Complementing existing discussions and focusing on knowledge production,
we draw attention to the different measurement regimes within passively captured behavioral logs
from industries. Taking an institutional perspective on measurement as a management technology,
we compare platforms with third party audience measurement firms. Whereas the latter measure to
provide “currency” for a multi-sided advertising market, the former measure internally for their
own administrative purposes (i.e., prescribing behavior through design). We demonstrate the
platform giants’ two-fold enclosure of first the user ecology and subsequently the previously open
market for user attention. With platform trace data serving as a lifeline for scholarly research,
platform episteme extends itself to enclose knowledge production. We conclude by suggesting
ways in which academic quantitative social sciences may resist these platform enclosures.
Audience Measurement, Platform Episteme, Computational Social Science, Digital Trace Data,
Knowledge Production
The authors acknowledge the excellent feedback received at Data & Society’s “Contested Data”
Academic Workshop. We thank danah boyd and Dan Bouk for organizing this amazing event.
Thanks also goes to Jessa Lingel and Josh Greenberg for their incisive comments.
Data offered by digital platforms including Facebook, Twitter, Google, Weibo, and
Wechat have created an entire ecosystem of social research. These fine grained logs of user
behavior (sometimes called “trace data,” which we use interchangeably), promise to measure
complex behaviors accurately, addressing a long-acknowledged shortcoming of self-reported
survey data (Lazer et al., 2009; Mayer-Schönberger and Cukier, 2013). Even fields such as public
opinion research, considered bastions of the survey method, are beginning to notice the value in
platform logs (Public Opinion Quarterly, 2019).
The blossom of “Big Data” social science research has spurred much skepticism (e.g.,
boyd and Crawford, 2012). Salient in the discussion are issues of data representativeness and
sampling biases, which are often linked to existing socioeconomic inequalities in the population
(e.g., Hargittai, 2020). Science and technology studies has also highlighted stages of subjective
decision-making typically hidden in the production of data by digital technology (Busch, 2016).
More recent debates have revolved around access. As the platforms become more reluctant to
provide data (Bruns, 2019) and societal concerns about privacy simmer, especially in the wake of
the Cambridge Analytica scandal, one question animates most discussion: What should social
science academics do to continue their work (King, 2011; Puschmann, 2019)? A newly proposed
“industry-academic” partnership to regulate and negotiate data and research sharing seems to
falter due to the platforms’ continued nonaction (Social Science One, 2019). Meanwhile, data
brokers such as CrowdTangle (acquired by Facebook) have streamlined easy access to select trace
data for all the interested parties.
We bring a different aspectregimes of measurementinto the ongoing conversation.
While existing discussions consider the research process from analyzing and weighting platform
data to its contested access, we take a step back further to consider how these data come about, for
what end, in order to rethink the best practices for employing platform-generated data for inferring
human behavior. The various pitfalls of repurposing platform data for academic social sciences,
we argue, result from the misalignment between platform datafication and the mainstream
epistemological norms of quantitative disciplines. Notably, our intervention takes a different
direction than the scholarship that foregrounds the expense of quantification in terms of
decontextualization, simplification, and the privileging of measurable phenomena. While fully
acknowledging insights and critiques along these lines, our goal here is to rethink the field of
quantitative social science through different regimes of measurement behind digital logs.
To illustrate this, we bring in traditional audience measurement firms such as Nielsen and
comScore, which also generate digital logs. Once supplying data that formed the basis of
advertising trade, these market research companies are now characterized as the less efficient
predecessors of social media platforms (McGuigan, 2019). Computational social scientists tend to
treat these two measurement regimes similarly since they both provide passively logged
behavioral traces for user analytics (Zhu et al., 2019). Even ethnographic researchers of journalism
refer to both as “audience analytics” without differentiation, despite that media organizations
strategize according to the exact measurement regime in use. This equation, only superficially
reasonable, glosses over essential differences between audience measurement in traditional media
and user analytics provided by digital platforms.
Importantly, what sets predominant giant platforms apart from traditional third party firms
is not the commercialization of measurement, because both scenes are characterized by privately
owned monopolies (Napoli and Napoli, 2019). The essential difference is that, unlike long-existing
third party audience measurement firms that measure to provide “currency” for a multi-sided open
market (i.e., advertising), platform companies measure for their own administrative purposes (i.e.,
prescribing sociality with technical design). This shift entails a two-fold corporate enclosure of,
first, the user ecology itself, and second, of the open market for user attention.
In differentiating the two regimes, our mission is not to declaim one type of data as more
“objective” than the other. As van Dijck (2014) points out, to stay vigilant of the ideology of
dataism, one should refrain from invoking its usual tropes such as “precision” and “objectivity.”
Instead, we draw on the history of technology and more recently, sociology of quantification, to
investigate measurement as an institutionally constituted technology for managing events
(Espeland and Stevens, 2008; Porter, 1995a). If datafication, as per common understanding, is the
transformation of social action into quantified data (Mayer-Schönberger and Cukier, 2013), what
we examine here, to be clear, is the institutional processes that give rise and shape to datafication.
Only by taking an institutional approach to the longer history of audience measurement
can we recognize that platform datafication represents a disrupture, for the institutional dynamics
of which they are a part are qualitatively different. When employing platform behavioral data,
thoughtful social science projects should confront their measurement conditions, and if possible,
go “against the grain” in research design to foreground platform power. To illustrate such research
orientations, we draw on insights from standpoint epistemology. Finally, we hope to demonstrate
that beyond what seems an outsized focus on platform data, social scientists may alternatively turn
to behavioral trace data from third party measurement to foreground structural influences and
social collectives in theorization.
Enclosure of Behavior: Datafication for Administering the User Ecosystem
To cast platform datafication in institutional light, we first regard platform behavioral data
as essentially administrative data that platforms generate to serve their own organizational goals.
Platform analytics, we argue, is ultimately a technology that draws on users’ behavioral traces
accumulated through multilayered platform management to evaluate and enhance “product
Big platforms’ business strategies and technological mediations of sociality are well-
researched in qualitative research and critical data studies (as early as Gillespie, 2010). Platforms
rely on in-house user analytics to constantly alter platform architectures (e.g., search query
autocompletion, result rankings, trending algorithms, personalized recommendations, social feed
curation, just to name a few) that are meant to change what users tend to search or click, what
content they are exposed to and able to choose from, and their habitual engagement with the
platform (e.g., Ananny, 2019; Andrejevic, 2013; Eslami et al., 2016; Grind et al., 2019). Further,
platforms conduct experimental trials including A/B testing to inform their design choice
(Hindman, 2018); they aggressively deploy select data as visible metricsLikes, Views, Retweets
(also the main type of platform behavioral data for social scientists) to induce user behavior (Gehl,
2014; Grosser, 2014; Salganik et al., 2006). These social media metrics, at the same time, are inflated
by entrepreneurial users of all kinds who strive to “game the system” (Karpf, 2012; Petre et al.,
2019). In fact, the online expansion of extremist communities, political and non-political alike, is
partially attributable to the success of these attempts by the previously marginalized niche pockets
(Ananny, 2019; Gerrard, 2018).
Crucially, the continued increase in profit drives platforms to tinker with curation to
prioritize content and products whose sales brings in more revenue, and change recommendation
systems to value “time-on-platform” over user feedback, culminating in what Nick Seaver (2019)
calls “captivation metrics” (also see Karppi, 2018, for Facebook’s wide-ranging efforts to obstruct
user disconnection). Even in the case of real users “gaming” systems metrics, research shows that
platforms only intervene when the specific methods harm their own economic benefits (Petre et al.,
2019). This also explains their demonstrated reluctance to eliminate fake accounts (for they boost
their reportable user-base) (Confessore et al., 2018). From email spam to the labyrinth of social
media discourse, “humans are not producing as much of the communicative traffic as we may
think” (Brunton, 2019, p. 16). This entire literature can be productively reframed as evidence of
platform data functioning as administrative data, which are typically the documentation of
implementation of services (Penner and Dodge, 2019). In other words, the institutional condition
for platform datafication is one in which tech giants generate and employ behavioral data to
administer platform usage to maximize corporate interests.
Indeed, “that social media platforms concomitantly
measure, manipulate,
online behavior” is a “paradoxical premise” (van Dijck, 2014, p. 200). Quantitative social scientists
need to confront the fact that platform data are not behavior-as-it-is, not self-evident capture of the
spontaneous unfolding of human conduct in the face of the world’s social, political, and cultural
happenings. Instead, the behaviors that platform data really reflect are part of an iterative process
whereby platform governance and its user ecosystem co-evolve. In short, platform business is
about stealthily nudging behavior, or more precisely, the records of purported user behavior, in
self-serving ways; platform data, in turn, are the administrative records (and inputs) of the
feedback loop that makes up the platform’s design process aimed at prescribing user behavior.
By regarding platform behavioral data as administrative data, we may foreground the
under-recognized scenario where academic labor has inadvertently gone into administrative
research for platforms. In a recent article, Penner and Dodge (2019) highlight the benefits in using
administrative data for social science, with a focus on data from government agencies. These
include overcoming methodological individualism by understanding individuals in their social
contexts—that is, contexts where policies and other structural changes take place. Further, it “has
the benefit of focusing researchers’ attention on the measures salient to practitioners and policy-
makers” (p. 9), who are “positioned to make decisions about practice and policy based on
researchers’ findings.” If we replace the “social context” with platform architectures and
policymakers with platform companies, it is easy to see the tendency for social science to get
assimilated into the epistemic plane of platform datafication, and for academic research based on
platform data to be readily feedable into the “iterative process of policy (read: design)
implementation” (p. 7).
After the bust, beginning with Google, followed by Facebook, the platforms
themselves started repurposing some of their own behavioral records for ad placement, a major
development in their administrative agenda (Srnicek, 2016; Zuboff, 2019). Their shift toward an
advertising-based business model effectively translates into what we term “platform enclosure of
measurement.” In the second institutional perspective, we examine behavioral trace data in the
context of the advertising market where attention is traded.
Enclosure of Measurement: Datafication for Structuring the Advertising Market
Since the advent of advertising supported business models, commercial media systems
have tried to measure user attention. Beginning with survey research and telephone coincidentals
(Beville, 1988), the measurement of user attention, a.k.a. “audience measurement,” has since the
1940s incorporated varying degrees of “passive” (i.e., unobtrusive) measurement through electronic
meters. Audience measurement quantifies audience size and profile, of which Nielsen Television
is a popular traditional example
A similar example in global online audience
measurement is comScore, which provides cross-platform traffic estimates across websites and
mobile apps. Pervading across media and markets, these measures form the basis on which
advertising real estate has been valued (Balnaves et al., 2011).
As quintessential examples of what sociologists Anand and Peterson (2000) characterized
as “market information regimes,” audience measurement systems generate reports at a predictable
frequency in a “consistent” format, following a methodology agreed upon by all stakeholders
involved. Above all, these reports are produced by a “neutral” entity—that is, an entity not party to
the transactions that such information enables. For instance,
Nielsen is neither an advertiser
(buyer) nor a media owner (seller), but a third party. Owing to third party ownership, audience
measurement data are publicly available for a fee, and every subscriber gets the same information.
An advertiser wanting to advertise on NBC and ABC can use Nielsen ratings to compare the
audiences of both these networks. They can access the full schedule of who advertised where and
when. Thus they can compare their advertising with competitors’ and also correlate it with user
attention for each spot ran. With these features, the measurement regime typified by Nielsen
provides market participants a common currency around which the market comes into its own as
an institutional field (Furchtgott-Roth et al., 2007; Taneja, 2013).
In the digital advertising market, much of third party measurement has given way to big
platforms that use trace data-based analytics to monetize online user attention via advertising. In
order to advertise on Facebook and Google, the same advertiser has to use both these companies’
own analytics’ dashboards, which provide only platform-specific data. Further, platform analytics
provide metrics only about the advertiser’s own campaign performance, but no general metrics
that reflect aggregate user attention on the platform. For example, even to paid clients, Twitter
gives at best vague descriptions about its trending tweets. The same goes with YouTube and its
trending videos. In sum, platform metrics, unlike their third party counterparts, do not qualify as a
common currency. They instead are produced in-house in a highly opaque, individualized manner,
and inevitably platform specific.
The most critical departure is that the third party companies have no administrative
investment in the metric they are producing. Their metrics serve as administrative data only for
their subscribers but not themselves. Providers such as Nielsen and comScore do not stand to gain
if people watch more television or visit particular websites. Meanwhile, the subscribers’
competing interests ensure that third party measurement stays clear of systematic manipulations.
Any attempt by comScore to elevate New York Times’ website traffic will be checked by
advertisers who also subscribe to the same data.
Figure 1: Models for monetizing attention for advertising
By contrast, Google and Facebook, which now control the majority of the global online
advertising real estate, have enclosed major portions of the previously open and hence competitive
market mechanisms, including logging, analytics, and ad placement (see Figure 1), which gives
them enormous power over other actors (Balnaves et al., 2011; Wu and Taneja, 2019: 23). While
the data science and machine learning arms wielded by tech platforms for internal management
have blossomed, the traditional market analytics sector which media outlets and advertisers
commission to examine third party data continues to decline (Tom O'Regan, personal
communication). Significantly, since the platform’s advertising division is part of its larger
business, the user analytics that function as conduits for ad placement are part and parcel of the
platform administrative data. In other words, platforms have both the capacity and incentive to
directly profit from tinkering with measurement procedures and metric formulas (and shaping user
behavior along the way).
Advertisers and publishers have routinely expressed concerns about their dependence on
these platforms, questioning their returns on investments (Joseph, 2018). Yet at the same time they
realize that these platforms cannot be audited for advertising effectiveness because unlike
television, online media “lacks the universal metrics that allow for valid like-for-like cost
comparisons” (Joseph, 2020). The same opaque circumstances enable aggressive micro-targeting in
economically predatory ads and political ads (which may effectuate disinformation campaigns
such as those by Cambridge Analytica).
Quantitative Social Science and the Datafication of Human Behavior
In the early 2010s, pioneering critiques accentuated academia’s growing attraction to the
“Big Data mentality” cultivated within tech companies and the two sectors’ increasing
interconnections in talents and techniques (boyd and Crawford, 2012; van Dijck, 2014). Building on
these critiques, we argue that these trends are integral to a platform enclosure of knowledge
production about human behavior. Beyond the platforms’ co-optation of academic expertise and
labor, the consequences of this form of enclosure also result from academia’s ready reliance on
platform log data. To better illustrate the stakes, we contrast this platform datafication first with
the norms of quantitative social science, and then with third-party audience measurement.
Though with varying instantiations, positivism generally pushes for research as
“depersonalizing gaze that separates subject from object” (Comaroff and Comaroff, 1992: 8), which
basically advocates for an “outsider” position from the phenomena being studied (i.e, being
“objective”). Ascending after World War II with a zenith in the 1960s, the human sciences
(sociology, political science, economics, psychology, and to a lesser extent anthropology and
history) gravitated towards positivism in a rather coherent manner, largely due to the funding
structures and other institutional politics (Steinmetz, 2005). Most quantitative fields today adhere to
this epistemological orientation. By invoking this background, we are not taking the positivist
epistemology as ideal (or natural). Nor are we limiting ourselves to acknowledging its socially
constructed nature. In fact, as historians of science have richly documented, both the meanings and
the operationalizations of “objectivity” are historically contingent (e.g., Daston and Galison, 2007).
Instead, the discussion that follows rests on the fact that positivist philosophy has
functioned as the prevailing rule of inquiry according to which quantitative social science has been
envisioned, organized, and incrementally developed. As Porter writes, “objectivity, in its various
meanings, is characterized by rather what it omits,”—that is, the term is invariably about
renouncing certain solid features of subjectivity (Porter, 1995a: 85). Whether it comes to
measurement or research design, this amounts to a studious insistence on the “ethic of personal
renunciation on the part of those who construct knowledge and make decisions” (ibid). Studying
platform user behavior while maintaining the positivist commitments requires the researcher to
position herself as an outsider to this behavior, and to achieve this entails taking hold not only of
platform log data, but also methodical data about platform architectures and internal
administrations. The absence of the latter, usually the case with academics, effectively undercuts
the epistemic integrity of quantitative social science.
At the risk of making a far-stretched analogy, it may be provocative to characterize
academic social sciences’ dependence on platform data as prone to inhabiting the platform’s
“standpoint.” Standpoint thinking is an alternative to positivism but equally committed to
empiricism. It asks one to acknowledge the social location whence one sees and investigates, while
actively accounting for this location (“standpoint”) to produce knowledge (Harding, 1993, 2005).
We may consider platform datafication as following standpoint thinking, wherein the platform
observes and takes notes, and constantly makes sense/use of these records through its
understanding of its own entrenchment and priority (a kind of “platform self-reflexivity”). But the
problem is, when the academic researcher steps in via repurposing platform log data, she risks
adopting the view that emits from the platform’s standpoint (and notably, stemming from a
dominant location that exerts power over social lives). Furthermore, while the researcher sees (a
slice of) what the platform sees, she is simultaneously kept from an adequate sense about the
platform’s inclinations and actions, which amounts to a severely constrained “standpoint
episteme.” This analogy hopefully illustrates the deep epistemic chasm between platform
behavioral data and mainstream quantitative social science.
Ignoring this chasm has concrete consequences. Readily applying pattern recognition
techniques to platform trace data, for example, tends to create woefully underrecognized analytical
oversights that effectively veil institutional and infrastructural influences, the platform’s included.
Trace data’s large volumes appear amenable to statistical techniques that enable pattern
recognition, which computational social scientists have recently labelled as “unsupervised
machine learning.” What is often forgotten is that meaningful interpretation of these patterns,
according to the logic of social science, is dependent on the researcher’s prior knowledge about the
structural conditions underlying the generation of these data. This criterion, however, usually
cannot hold for social scientists to whom the ways in which algorithms and other architectural
features govern platform user ecology remain opaque.
Consider research on political polarization for which use of social media data is rampant.
When users’ tweeting patterns, their followers, who they follow, and the content of their tweets is
analyzed to assess “selective exposure,” a construct rooted in the presumption about individual
agency, the study usually fails to take into account who is exposed to what content in the first
place, since Twitter closely guards the algorithm for its timeline. Similar problems arise in using
Twitter data to study the insurgent public sphere during Occupy Wall Street when due to unknown
algorithmic workings, the very term failed to Trend (see Gillespie, 2016); or using Uber’s rides data
to study commuting patterns when Uber wields its driving force with strategies such as price
surging under the name of (predicted but unverifiable) high demand (see Rosenblat and Stark,
2016); or using YouTube, or more fantastically Netflix data, to discern media preferences, when
these platforms’ entire business rests on nudging sequences of viewing (Seaver, 2019). The ready
Throughout this piece, we deliberately keep these examples generic enough to invite readers to reflect on the
broader literature in this area, and hence refrain from citing specific studies.
application of pattern recognition techniques, therefore, functions to obscure platform power.
To elucidate our present, we invoke a historical episode in academic repurposing of
industrial trace data. In the 1960s, when large volumes of data from third party audience
measurement companies became available, along with growing access to computers, scholarly
interests turned towards employing statistical techniques to uncover patterns of audience behavior.
Factor analysis was one major such technique, which simultaneously examines the correlations of
several variables together to identify highly correlated sets of variables (factors). Early research
was quick to interpret these factors as reflecting audiences’ content preferences. However, a
thorough investigation by Andrew Ehrenberg (1968), an academic statistician versed in the
collection and reporting of Nielsen data, demonstrated that these patterns reflected structural
conditions such as program schedules and people’s socially situated availability to watch rather
than their content preferences. This episode shows that contextual knowledge about measurement
and its institutional conditions, as well as other (infra)structural influences was indispensable to
detecting and interpreting viewing patterns, and to eventually theorizing audience behavior.
Both the techniques and hardware of computing have evolved tremendously since the
1960s. Compared to researchers who seized upon factor analysis to crack large behavioral datasets
from third party measurement, today’s computational social scientists know even less about the
methods or contexts underlying the generation of platform log data. No amount of statistical
sophistication or computing power can make up for the lack of this knowledge. Moreover, since
everyone lacks knowledge about what slices, or forms, of human behavior these data represent,
even critiques on the appropriateness of Big Data analysis have remained largely conceptual and
Viewed in the same light, third party datafication comports with quantitative social science
that asserts an “outside” view to determine regularity in data for predictive generalization. First, it
represents a prime example wherein the drive for quantitative rigor has grown from attempts to
develop a strategy of impersonality to cope with multi-sided pressures (see Porter, 1995a). To
appease contrarian interests on the open advertising market, third party measurement constantly
displays efforts to provide “precise” and “objective” metrics. All subscribers have access to its
methodology documents. AC Nielsen, the founder of the Nielsen Company, has authored multiple
peer reviewed papers in top academic journals explaining the methodology of Nielsen ratings
(Nielsen, 1945). Academics working with these data are authoritative enough to constantly testify
as experts in related court hearings. Data providers have developed the tradition to work with to
improve data quality (RTI International, 2016; Milavsky, 1992). Changes in methodology are
evaluated and contested by various affected parties, together with independent non-profits and
academics, all detailed in public fora comprising trade press and academic journals (Barnes and
Thomson, 1994; Napoli, 2005; Andrews and Napoli, 2010). For example, advertisers over the
decades have forced Nielsen to move away from paper diaries to meters for measuring television
audiences in local markets.
Second, while platform logs fail the high standards of representativeness set by survey
research, not all log data suffer from this problem. Although not strict probability samples, panels
constructed by third party companies such as Nielsen and comScore can be evaluated for bias in
representation. These companies provide full details of the baseline or establishment surveys they
conduct to enumerate the population before constructing panels of individuals who are put under
continuous measurement. The relative transparency and accountability of methodology enables
academics repurposing such data to evaluate and address likely biases, including measurement and
non-response errors. By contrast, as platforms completely internalize the datafication process, for
academics who obtain their data by either scraping or using APIs, these errors are nearly
impossible to assess. For the same reason, in contrast to its platform counterpart, third party trace
data allow meaningful application of pattern recognition techniques.
Repurposing Behavioral Trace Data against Platform Enclosure of Knowledge
Some social scientists are able to collect their own behavioral log data (Scharkow et al.,
2020). Such data collection, which typically involves installing trackers in personal digital devices,
entails huge economic costs, technical expertise, logistic and organizational prowess, as well as
ethical conundrums. As a result, “repurposing” industrial log data became a mainstream practice.
We echo Salganick (2019) in arguing that human behavior can be meaningfully studied with log
data, provided that the researcher develops a proper knowledge of what aspect of behavior these
data represent and what they exclude, and with this knowledge, refrain from extrapolating from the
data to answering research questions out of bounds. We add to these considerations by proposing
directions of research in light of the epistemological and political concerns arising from the act of
Repurposing Platform Trace Data to Reveal Platform Power
When artists and scholars just began to rely on metrics of social media, Nancy Baym (2013)
called for attention to the potential misalignment between values they hold dear and the “economic
values” intrinsic to social media metrics. Our analysis about institutional conditions for
datafication extends Baym’s early warning. As we have suggested, a platform operates with its
own episteme by maintaining specific procedures of capture and valuation to administer behavior
through design and on-platform incentivization. It is from this “standpoint” that arose Facebook’s
in-house experiments to induce “emotional contagion” (Kramer et al., 2014), and to an extent, user
analytics and manipulation tactics wielded by companies such as Cambridge Analytica. As
manifestations of this episteme, platform data infrastructure constantly extends itself by co-opting
intellectual endeavor and potentially reconfiguring the latter’s epistemological assumptions.
This stealth expansion of platform episteme diverges further and further away from what is
considered the ideal practice of repurposing platform data, where academic social scientists,
journalists, and activists come to advance their own agendas with radically different takes on what
the data mean or what they can do, vis-a-vis corporate data scientists (also see Acker and Donovan,
2019, who use “trading zone” to describe such data practices; Galison, 1997). To escape the gravity
of platform episteme entails careful planning and real efforts.
What might be fruitful ways of repurposing platform behavioral data that resist platform
enclosure? To do so requires highlighting the presence of platform architectures in research
design. Platforms are increasingly like highways, power grids, and undersea cables. Society
organizes its activities around these infrastructures but typically ignores their existence (Plantin et
al., 2018). Through obscuring their multifarious influences over human behavior, platforms acquire
both economic gains and sociocultural legitimacy. The term “seamless design” basically refers to
the tactic of hiding technological mediations (Eslami et al., 2016). In-platform advertising, for
example, depends on visually blending ads with “organic posts.” Against this backdrop, platform
behavioral trace data offer researchers opportunities to push back the platform’s tendency to
conceal and bring its administrative role to the fore. This requires the researcher to laboriously
steer away from the platform standpoint to those from determinate locations (e.g., users, especially
vulnerable social groups).
Examples of such studies include algorithmic auditing that conducts reverse engineering to
spot discriminatory and other unsavory practices hidden in platform curation (Sandvig et al., 2014),
critical interrogation into platform proffered datasets that effectively downplay the influences of
platform products, such as in recent disinformation campaigns (Acker and Donovan, 2019), and the
detection of bots as well as human users whose behaviors are systematically coordinated to exploit
platform features to serve larger political and economic forces, such as in the case of state-
sponsored publicity stunts in disguise (King et al., 2017). The focus on platform interference, to be
clear, is particularly meaningful as intervention in public knowledge to raise awareness of growing
platform power. By this standard, such a research focus is comparatively less urgent, perhaps, for
social scientists working with behavioral data from non-profit platforms such as Wikipedia.
Another way to resist platform enclosure of knowledge production is to involve “outside”
data sources. Platform episteme cannot easily assimilate academic queries into how the platform
user ecology relates to broader social structures, offline world happenings, or even other
platforms. Taking platform user activities as what they are and studying their relations with non-
platform factors (e.g., Bail et al., 2019) may “put the platform in perspective.” For instance,
analyzing connections between political candidates’ Facebook fan pages, rolling poll data, and the
various candidates’ background characteristics reveals the circumstances under which social
media campaigns affect electoral momentum (Tang and Lee, 2018). As discussed in the previous
section, a potential pitfall here is to readily establish platform metrics as proxies for “generic”
human behaviors, and in so doing again render the platform a neutral vehicle. For example, does
Twitterverse stand for a behavioral proxy for online conversations when Twitter gains through
instigating more conversations? Likewise, user interactions with Weibo accounts of government
agencies should not be taken as a measure for government responsiveness.
Repurposing Third Party Trace Data to Study Structures and Collectives
A different case can be made regarding behavioral log data from third party audience
measurement, which has remained marginalized for multiple reasons. Academics from critical
social sciences and humanities have dismissed it due to its corporate origin, a criterion which, as
shown by our analysis, veils the fundamental differences in institutional dynamics that shape
datafication. Such data are also uncommon in the current “Big Data” analyses because, among
other things, dealing with it requires a set of organizational, technical, and historical knowledge
distinct from that involved in gathering data from digital platforms. But also importantly, as we
will soon discuss, analyzing third party log data entails a level of analysis higher than the
individual, which typically falls out of the comfort zone of the mainstream of established social
sciences including sociology, psychology, and communication studies.
Taking a different tack, we argue that data of this nature are particularly attuned to
identifying and investigating latent structural influences on social collectives, which has
extraordinary bearings in the current digital climate. It is however worth cautioning upfront that,
just like with platform data, academic repurposing of third party data needs to get beyond the logic
and intention with which the data are generated. For example, in scholarships such as journalism
studies, merely reporting these data to compare the popularity of news outlets remains a
descriptive exercise and has little theoretical purchase.
Unlike platform logs that are restricted to behavior on the specific platform, third party
data with people (i.e., user panels) at its heart has the first strength of being platform neutral.
ComScore for example estimates traffic to all sites that meets its minimum sample threshold for
weighting and projecting to universe estimates. Nielsen has always provided estimates for all
measurable TV channels and radio stations. Following this, as trace data with open,
comprehensive information on sampling and measurement, third party audience data can be used
to evaluate the efficacy of traditional methods that rely on a priori prompts and self-reports.
Consider media exposure, a construct common to several social and behavioral sciences. As early
as the 1980s, scholars became aware that getting at usage through surveys could be problematic, yet
surveys remain a ubiquitous method for media exposure. This was only challenged when passively
obtained behavioral data from third party measurement were employed to gauge the error in self-
reported media usage. Political scientist Markus Prior (2009), for instance, analyzed Neilsen ratings
for news programs to estimate the extent by which news viewership obtained through the
Annenberg National Election Survey was inflated. Relatedly, economists and communication
researchers have used such data to establish that internet use measured by surveys overestimated
the extent of political polarization in the US (Gentzkow and Shapiro, 2011; Webster and Ksiazek,
In fact, in the pre-digital era, third party audience metrics had long been used for theory
building in the social sciences. In one of the earliest instances, multidisciplinary teams of
communications, marketing, and statistics scholars have repurposed television viewership data
from Nielsen and its British counterpart AGB to theorize audience behavior (Barwise and
Ehrenberg, 1988). Yielding four decades of research, this tradition demonstrates that structural
factors, such as patterns of people’s availability and scheduling strategies of networks, explained
audience behavior much better than the content preferences that people volunteered in survey
responses (Webster and Phalen, 1997). Sociologists have also used music sales data from Billboard
and SoundScan to document changing musical tastes among Americans alongside business
strategies of major record labels (Anand and Peterson, 2000). In short, while the survey method
tends to reduce media use as stemming from individual needs and preferences alone, third party
log data allow social scientists to model the role of structural influences; this methodological shift
has ushered in a shift in theoretical discussions (Webster, 2014). Such an orientation, we argue, is
much needed today to better comprehend the shaping forces of multifarious architectures of
platform technology and the internet more broadly.
One common critique of using third party measurement for research is about the level of
analysis. Data from comScore and Nielsen are rarely available at the respondent level (Taneja,
2016). Instead, they come as aggregations of traffic to media outlets, which can be segmented
through user demographics and often attitudes. This aggregate level of analysis is prone to issues
of ecological fallacy, which results from theorizing about (individual) human behavior when the
data are available at the level of collectives.
Indeed, comparatively platform logs offer more granular individual-level behavioral
traces. But this apparent strength has a potential downside against the backdrop of platform
enclosures of user behavior and measurement. Platform administration and by extension, digital
manipulation techniques of companies like Cambridge Analytica, all revolve around individual-
level targeting. Based on in-house data and, increasingly, external data acquired via unregulated
data transactions, platforms model behavior to generate predictive analytics that helps tailor the
singular reality each individual is exposed to, in order to nudge her behavior. In other words,
individual-level user analytics, while conforming to the conventional unit of analysis of various
social sciences, are also potentially susceptible to platform enclosure. Even more STEM-oriented
scholarly communities such as the “ACM User Modeling, Adaptation and Personalization”
conference have recognized the potential pitfall and chose “Responsible Personalization” as their
2020 theme to highlight the ethical concern.
Our unfolding economic, political, and technological contexts require us to explore
alternatives to defaulting to the sovereign individual as the self-evident basic unit of social
existence. Parallel arguments have been made that, for example, champion a reconceptualization
of platforms from private providers of individualized service to public utilities that have societal
implications and thus social responsibilities (Plantin et al., 2018). Scholars also began to promote a
shift in focus from the First Amendment that protects individual speech to the networked
landscape of platform discourses, because the latter exerts more immediate influence on the
functioning of our democratic process (Ananny, 2019). We live in a world where social collectives
bear the consequences of platform power as well as weather constantly emerging infrastructural
technologies in unequal manners. Meanwhile, due to algorithmic personalization and architectural
modulation, individual experiences in the digital environment have become more fragmented and
fluid than ever. To what extent it remains essential that theorization needs to hinge on the
individual? Raising the level of analysis, then, to that of collectives and creatively examining trace
data by varying aggregates can be a fruitful direction for social scientific discovery and theory
building (for such an example, see Taneja and Wu, 2014, which uses comScore global web usage
data to access the impact of stage-sponsored access blockage).
Finally, in addition to opening up new realms in empirical and theoretical research, third-
party audience measurement allows better protection of individual privacy. As discussed above, its
datafication amounts to “a form of surveillance known,” wherein with informed consent and
compensation, recruited panelists have full knowledge of how their media behaviors are being
monitored. This sharply contrasts the platforms’ often obfuscated surveillance and monetization
practices (also see Lingel, 2019).
Over the last decade, “Big Data” obtained from giant platforms became a fertile ground,
and for specific areas, a lifeline for academic inquiries into human behavior. In response to this
new configuration of scholarly production, from technical to ethical, concerns have been raised
regarding representativeness, data access and privacy. On the one hand, some researchers seek to
highlight the strengths of traditional methods (e.g., surveys and experiments) to the fast-growing
computational social science community where talents from STEM fields also coalesced. On the
other hand, many scholars strive to envision more accountable access and sharing protocols for
platform data. To complement these current conversations, we pursue a different line of
interrogation, which neither guards academics from corporate platform data, nor distinguishes
“Big Data” from data collected through traditional methods. Instead, we draw a line within the
general category of Big Data, typified by passively captured behavioral logs with industrial origin.
Within this kind of data, there are fundamental differences in the measurement regime that
have been elided in the existing discussion. And in here, we argue, lies the key to evaluate afresh
the now extensive efforts to employ platform log data for knowledge production. To illustrate
these consequential features of platform datafication, we introduce third-party log data with
historical, organizational, and technical specifics. Our analysis demonstrates that, arising from
different institutional conditions, regimes of behavioral measurement differ, and they lead to
different forms of knowledge via scholarly practice.
As historians of science and technology have illustrated, the “data explosion” during the
mid-20th century resulted from the expansion of public domains, which ushered in a proliferation
of “constraining” measurements and “impersonal” information (Heyck, 2015; Porter, 1995b). Our
analysis of the institutional conditions for measurement and datafication suggests that the present-
day “data explosion” represented by the prevalence of platform trace data is not an acceleration of
the existing trend but rather a reversionplatform enclosures of sociality, the public sphere, and
the evaluation of public attention with political, economic, and cultural bearings.
The divergence of platform trace data in measurement regimes is particularly
consequential, when viewed from the perspective of quantitative social science, a dominant
enterprise that shapes our understanding of the social world. Constituted in its distinct conditions
of datafication, platform episteme manifests in individual-level predictive analytics, the obscurity
of architectural nudges, as well as the perceptual centrality of platform-based ecology. This
episteme powerfully extends itself via the increasingly encompassing platform data
infrastructures. Against this backdrop, academic social sciences should push back by revisiting
fundamental issues about knowledge production from datafication to analysis and interpretation.
Third party measurement, in this sense, amounts to a viable and potentially countervailing source
in resistance to the enclosure of platform episteme. Ultimately, we hope this essay will be an initial
call that addresses all the stakeholdersjournalists, NGOs, and the general publicwho are
enamored by the explosion of platform Big Data and drawn to its episteme to represent and
understand our social reality and ourselves.
Acker A and Donovan J (2019) Data craft: a theory/methods package for critical internet studies.
Information, Communication and Society
22(11): 1590-1609.
Anand N and Peterson RA (2000) When Market Information Constitutes Fields: Sensemaking of
Markets in the Commercial Music Industry.
Organization Science
11(3): 270-284.
Ananny M (2019)
Probably Speech, Maybe Free: Toward a Probabilistic Understanding of Online
Expression and Platform Governance
. Available at:
Andrejevic M (2013)
Infoglut: how too much information is changing the way we think and know
New York: Routledge.
Bail CA, Brown TW and Wimmer A (2019) Prestige, Proximity, and Prejudice: How Google
Search Terms Diffuse across the World.
American Journal of Sociology
124(5): 1496-1548.
Balnaves M, O’Regan T and Goldsmith B (2011)
Rating the Audience: The Business of Media
London: A&C Black.
Barwise P and Ehrenberg A (1988)
Television and Its Audience
. London: SAGE.
Baym NK (2013) Data not seen: The uses and shortcomings of social media metrics.
First Monday
Beville HM (1988)
Audience Ratings: Radio, Television, and Cable
. London: Psychology Press.
boyd d and Crawford K (2012) Critical questions for big data: Provocations for a cultural,
technological, and scholarly phenomenon.
Information, Communication and Society
Bruns A (2019) After the “APIcalypse”: social media platforms and their fight against critical
scholarly research.
Information, Communication and Society
22(11): 1544-1566.
Brunton F (2019) Hello from Earth. In
. Minneapolis: University of Minnesota
Press, pp. 149.
Busch L (2016) Looking in the Wrong (La)place? The Promise and Perils of Becoming Big Data.
Science, Technology & Human Values
42(4): 657678.
Comaroff J and Comaroff J (1992) Ethnography and the historical Imagination. In
Ethnography and
the historical Imagination.
Boulder: Westview, pp. 348.
Confessore N, Dance GJX, Harris R and Hansen M. (2018) The Follower Factory.
The New York
Daston LJ and Galison P (2007)
. Cambridge: Zone Books.
Ehrenberg ASC (1968) The factor analytic search for program types.
Journal of Advertising
8(1): 5563.
Eslami M, Karahalios K, Sandvig C and Vaccaro K (2016) First I like it, then I hide it: Folk theories
of social feeds.
Proceedings of the 2016 CHI Conference on Human Factors in Computing
Espeland WN and Stevens ML (2008) A Sociology of Quantification.
European Journal of
49(3): 401436.
Furchtgott-Roth H, Hahn RW and Layne-Farrar A (2007) The law and economics of regulating
ratings firms.
Journal of Competition Law & Economics
3(1): 4996.
Galison P (1997)
Image and Logic: A Material Culture of Microphysics
. Chicago: University of
Chicago Press.
Gentzkow M and Shapiro JM (2011) Ideological Segregation Online and Offline.
The Quarterly
Journal of Economics
126(4): 17991839.
Gehl RW (2014)
Reverse Engineering Social Media
. Philadelphia: Temple University Press.
Gerrard Y (2018) Beyond the hashtag: Circumventing content moderation on social media.
Media & Society
20(12): 44924511.
Gillespie T (2010) The politics of “platforms.”
New Media & Society
12(3): 347364.
Gillespie T (2016) #Trendingistrending: When Algorithms Become Culture. In Seyfert R and
Roberge J (eds)
Algorithmic Cultures: Essays on Meaning, Performance and New
New York: Routledge, pp. 5275.
Grind K, Schechner S, McMillan R and West J (2019) How Google Interferes With Its Search
Algorithms and Changes Your Results.
WSJ Online
Grosser B (2014) What do metrics want? How quantification prescribes social interaction on
Computational Culture
Harding S (1993) Rethinking Standpoint Epistemology: What Is “Strong Objectivity”? In Alcoff L
and Potter E (eds)
Feminist Epistemologies
New York: Routledge, pp. 4982.
Harding S (2005) Negotiating with the positivist legacy: New social justice movements and a
standpoint politics of method. In Steinmetz G (ed)
The Politics of Method in the Human
Sciences: Positivism and Its Epistemological Others.
Durham: Duke University Press, pp.
Hargittai E (2020) Potential Biases in Big Data: Omitted Voices on Social Media.
Social Science
Computer Review
38(1): 10-24.
Heyck H (2015)
Age of System: Understanding the Development of Modern Social Science
Baltimore: Johns Hopkins University Press.
Hindman M (2018)
The Internet Trap: How the Digital Economy Builds Monopolies and
Undermines Democracy
. Princeton: Princeton University Press.
Joseph S (2018) Organic reach on Facebook is dead’: Advertisers expect price hikes after
Facebook’s feed purge. Available at:
Joseph S (2020) Ebiquity tries to pivot via the acquisition of digital advisory firm Digital Decisions.
Available at:
Karpf D (2012) Social science research methods in Internet time.
Information, Communication and
. 15(5): 639-661.
Karppi T (2018)
Disconnect: Facebook’s Affective Bonds
. Minneapolis: University of Minnesota
King G (2011) Ensuring the data-rich future of the social sciences.
331(6018): 719721.
King G, Pan J and Roberts ME (2017) How the Chinese Government Fabricates Social Media Posts
for Strategic Distraction, Not Engaged Argument.
American Political Science Review
Kramer ADI, Guillory JE and Hancock JT (2014) Experimental evidence of massive-scale
emotional contagion through social networks.
Proceedings of the National Academy of
Sciences of the United States of America
111(24): 87888790.
Lazer D, Pentland A, Adamic L et al. (2009) Computational Social Science.
323(5915): 721
Lingel J (2019) Notes from the Web that Was: The Platform Politics of Craigslist.
Surveillance &
17(1/2): 2126.
Mayer-Schönberger V and Cukier K (2013)
Big Data: A Revolution that Will Transform how We
Live, Work, and Think
. London: Houghton Mifflin Harcourt.
McGuigan L (2019) Automating the audience commodity: The unacknowledged ancestry of
programmatic advertising.
New Media & Society
. DOI: 10.1177/1461444819846449
Milavsky JR (1992) How Good is the A.C. Nielsen People-Meter System? A Review of the Report
by the Committee on Nationwide Television Audience Measurement.
Public Opinion
56(1): 102115.
Napoli PM and Napoli AB (2019) What social media platforms can learn from audience
measurement: Lessons in the self-regulation of “black boxes.”
First Monday
Nielsen AC (1945) Two Years of Commercial Operation of the Audimeter and the Nielsen Radio
Journal of Marketing
9(3): 239-255.
Penner AM and Dodge KA (2019) Using Administrative Data for Social Science and Policy.
Russell Sage Foundation Journal of the Social Sciences
5(2): 118.
Petre C, Duffy BE and Hund E (2019) “Gaming the System”: Platform Paternalism and the Politics
of Algorithmic Visibility.
Social Media + Society
5(4). DOI: 10.1177/2056305119879995
Plantin J-C, Lagoze C, Edwards PN and Sandvig C (2018) Infrastructure studies meet platform
studies in the age of Google and Facebook.
New Media & Society
20(1): 293310.
Porter TM (1995a)
Trust in Numbers: The pursuit of objectivity in science and public life
Princeton: Princeton University Press.
Porter TM (1995b) Information cultures: A review essay.
Accounting, Organizations and Society
20(1): 8392.
Public Opinion Quarterly (2019)
New Data in Social and Behavioral Research
. Available at:
Puschmann C (2019) An end to the wild west of social media research: a response to Axel Bruns.
Information, Communication and Society
22(11): 1582-1589.
Rosenblat A and Stark L (2016) Algorithmic Labor and Information Asymmetries: A Case Study of
Uber’s Drivers.
International Journal of Communication
10: 37583784.
RTI International (2016)
CRE Guide for Validating New and Modeled Audience Data v.1.0
Available at:
Salganik M (2019)
Bit by Bit: Social Research in the Digital Age
. Princeton: Princeton University
Salganik MJ, Dodds PS and Watts DJ (2006) Experimental study of inequality and unpredictability
in an artificial cultural market.
311(5762): 854856.
Sandvig C, Hamilton K, Karahalios K and Langbort C (2014) Auditing algorithms: Research
methods for detecting discrimination on internet platforms.
Data and Discrimination:
Converting Critical Concerns into Productive Inquiry
. Available at:
Scharkow M, Mangold F, Stier S and Breuer J (2020) How social network sites and other online
intermediaries increase exposure to news.
Proceedings of the National Academy of Sciences
of the United States of America
117(6): 2761-2763.
Seaver N (2019) Captivating algorithms: Recommender systems as traps.
Journal of Material
24(4): 421436.
Social Science One (2019)
Public statement from the Co-Chairs and European Advisory
Committee of Social Science One
. Available at:
Srnicek N (2016)
Platform Capitalism
. Cambridge: Polity.
Steinmetz G (2005)
The Politics of Method in the Human Sciences: Positivism and Its
Epistemological Others
. Durham: Duke University Press.
Taneja H (2013) Audience Measurement and Media Fragmentation: Revisiting the Monopoly
Journal of Media Economics
26(4): 203-219.
Taneja H (2016) Using Commercial Audience Measurement Data in Academic Research.
Communication Methods and Measures
10(2-3): 176-178.
Taneja H and Wu AX (2014) Does the Great Firewall really isolate the Chinese? Integrating access
blockage with cultural factors to explain web user behavior.
The Information Society
Tang G and Lee FLF (2018) Social media campaigns, electoral momentum, and vote shares:
evidence from the 2016 Hong Kong Legislative Council election.
Asian Journal of
28(6): 579-597.
van Dijck J (2014) Datafication, dataism and dataveillance: Big Data between scientific paradigm
and ideology.
Surveillance & Society
12(2): 197-208.
Webster JG (2014)
The Marketplace of Attention: How Audiences Take Shape in a Digital Age
Cambridge: MIT Press.
Webster JG and Ksiazek TB (2012) The Dynamics of Audience Fragmentation: Public Attention in
an Age of Digital Media.
Journal of Communication
62(1): 39-56.
Webster JG and Phalen P (1997)
The mass audience: Rediscovering the dominant model
. Mahwah:
Lawrence Erlbaum Associates.
Wu AX and Taneja H (2019) How did the data extraction business model come to dominate?
Changes in the web use ecosystem before mobiles surpassed personal computers.
Information Society
35(5): 272-285.
Zhu JJH, Zhou Y, Guan L, Hou L, Shen A and Lu H (2019) Applying user analytics to uses and
effects of social media in China.
Asian Journal of Communication
29(3): 291-306.
Zuboff S (2019)
The age of surveillance capitalism: The fight for a human future at the new frontier
of power
. New York: Profile Books.
... Esto resulta particularmente visible, por ejemplo, frente a propuestas de alfabetización mediática diseñadas y promovidas por empresas de tecnologías digitales cuyos dispositivos y plataformas se incorporan en instituciones educativas, lo mismo de sostenimiento público que privado. Otro ejemplo de esta forma de entender la alfabetización digital se observa en la preeminencia de los algoritmos en la configuración de prácticas sociales a partir de la plataformización y la extracción de datos (Wu y Taneja, 2021), las cuales materializan una dinámica de opresión cercana a la sociedad de programadores visualizada por Flusser. Frente a este fenómeno, una mirada crítica no es sinónimo de rechazo a lo digital, pero sí un llamado a cuestionar las asimetrías, las desigualdades y las lógicas de exclusión sobre las que operan en los medios digitales, en aras de formular una nueva síntesis entre las formas de sociabilidad que hoy están en conflicto. ...
Full-text available
Este artículo presenta reflexiones teóricas, resultados y conclusiones derivados de una investigación en torno a la manera en que se manifiesta la alteridad en la interacción de los adolescentes con imágenes en red, buscando incorporar los hallazgos a una noción enriquecida de alfabetización digital crítica. Se expone el marco teórico desde una perspectiva interdisciplinaria, apoyada en la comunicación, la pedagogía y la filosofía, para definir alfabetización digital crítica, mediación de imágenes en red y alteridad. El trabajo de campo se realizó durante 2020 a través de entrevistas en profundidad enriquecidas con análisis basado en imágenes, con adolescentes de 13 a 17 años, radicados en tres ciudades mexicanas. Entre los hallazgos se identificó el potencial de las imágenes como vía para abordar el problema de la alteridad y el papel que puede tomar la escuela para potenciar el uso de estas imágenes y contribuir a la toma de conciencia sobre la alteridad como responsabilidad hacia el Otro, en el marco de una alfabetización digital crítica.
... Twitter and Facebook began to aggressively remove disinformation content after the US 2016 Elections [19] and Parler's entire service was removed in early 2021 [1], for example. Another possible explanation is to allow the Gettr administrators the opportunity for "provenance laundering" of platform information or curating a favorable image to be presented to the Internet public [42]. This is not an unknown phenomenon for social networks, and mainstream platforms' data is never taken to represent human behavior-as-it-is, because of Facebook of Twitter administratively decide to curate data towards maximization of corporate interests [41]. ...
Full-text available
As yet another alternative social network, Gettr positions itself as the "marketplace of ideas" where users should expect the truth to emerge without any administrative censorship. We looked deep inside the platform by analyzing it's structure, a sample of 6.8 million posts, and the responses from a sample of 124 Gettr users we interviewed to see if this actually is the case. Administratively, Gettr makes a deliberate attempt to stifle any external evaluation of the platform as collecting data is marred with unpredictable and abrupt changes in their API. Content-wise, Gettr notably hosts pro-Trump content mixed with conspiracy theories and attacks on the perceived "left." It's social network structure is asymmetric and centered around prominent right-thought leaders, which is characteristic for all alt-platforms. While right-leaning users joined Gettr as a result of a perceived freedom of speech infringement by the mainstream platforms, left-leaning users followed them in numbers as to "keep up with the misinformation." We contextualize these findings by looking into the Gettr's user interface design to provide a comprehensive insight into the incentive structure for joining and competing for the truth on Gettr.
... These studies have widened more recently to embrace digital platforms of all kinds which the EU's proposed Digital Markets Act and accompanying Digital Services Act seek to bring within its remit (Seering et al. 2019;Thorson et al. 2019;Walker et al. 2019). A growing research stream examines the use of algorithms in public sector services and citizen management, including investigations of smart cities and about "AI/data for humanitarian aid" (Dudhwala & Larsen 2019;Hong et al. 2019;Park & Humphry 2019;Veale & Brass 2019;Young et al. 2019), algorithms for labor management and "the future of work" (Jarrahi & Sutherland 2019;Shafiei Gol et al. 2019;Gal et al. 2020;Newlands 2021), scholarly investigations of the platform economy (or "gig" economy) and its companies, business models, and how they transform markets (Cheng & Foley 2019;Fenwick et al. 2019;Glaser et al. 2019;Leoni & Parker 2019;Wu & Taneja 2020). One conceptual challenge that arises from this rich and diverse profusion of research lies in the wide variety of rubrics whose relation to the concept of algorithmic regulation is uncertain and yet to be interrogated. ...
Full-text available
This paper offers a critical synthesis of the articles in this Special Issue with a view to assessing the concept of “algorithmic regulation” as a mode of social coordination and control articulated by Yeung in 2017. We highlight significant changes in public debate about the role of algorithms in society occurring in the last five years. We also highlight prominent themes that emerge from the contributions, illuminating what is distinctive about the concept of algorithmic regulation, reflecting upon some of its strengths, limitations, and its relationship with the broader research field. In closing, we argue that the core concept is valuable and maturing. It has evolved into an analytical bridge that fosters cross-disciplinary development and analysis in ways that enrich its early “skeletal” form, thereby enabling careful and context-sensitive analysis of algorithmic regulation in concrete settings while facilitating critical reflection concerning the legitimacy of existing and proposed regulatory regimes.
Scholars and observers attribute many democratic benefits to local news media. This paper examines exposure to local and national news media websites, side-by-side in one model, testing their over-time effects on political participation, knowledge, and affective and attitude polarization. We test whether traditionally disengaged or disadvantaged groups (i.e., racial minorities, those with low education levels, politically disinterested, and those who do not consume national news), may particularly benefit from local news consumption. To this end, we combine three-wave panel surveys (final N = 740) with 9 months worth of web browsing data submitted by the same participants (36 million visits). We identify exposure to local and national news sites using an extensive list of news domains. The results offer a robust pattern of null findings. Actual online exposure to local news has no over-time effects on the tested outcomes. Also, exposure to local news sites does not exert especially strong effects among the tested sub-groups. We attribute these results to the fact that news visits account for a small fraction of citizens’ online activities, less than 2% in our trace data. Our project suggests the need to evaluate the effects of local news consumption on the individual level.
Purpose The digital environment afforded by social networks has created an opportunity to understand more clearly the impact of social media native advertising on advertising processing outcomes. Thus, the current study integrates native advertising with engagement literature to compare engagement outcomes between feed and banner placements before analyzing engagement outcomes of sponsored social media posts by advertising objective. This work aims to contribute to advertising effectiveness literature arguing for the importance of engagement as a measure of effectiveness. Design/methodology/approach Facebook advertising data were collected from a convenience sample of 10 Facebook advertisers that accounted for roughly $414,000 in advertising spend. Panel data, which are also called longitudinal or cross-sectional time-series data, used 26 months of data from the 10 advertisers to measure relationships between native advertising exposure and digital consumer engagement with advertising by advertising objectives of brand awareness, link clicks, conversions, post-engagement and video views. Findings Exposure to native advertising was a strong predictor of advertising processing and consumption using the three variables of interest: clicks, comments and shares. Ads reaching consumers while natively consuming content in their feed resulted in statistically significant improvements in impressions and clicks when compared to banner ads. Exposure to native ads was significantly related to all engagement outcomes of interest, except for advertisers who chose post-engagement as their advertising objective. Practical implications The results suggest that for advertisers seeking clicks, post-engagement objectives should likely be avoided. For this group, impressions were not related to link clicks but were related to comments and shares. Native advertising placements in the feed, however, are generally more effective than banner ads on Facebook for advertisers seeking engagement. Research limitations/implications This research is one of few studies to use longitudinal advertising data to explore engagement effects using real-world data collected from a diverse set of Facebook advertisers over a 26-month period. This study shows that interactive marketers using a social media feed to reach consumers can expect positive outcomes in advertising consumption, affective and cognitive processing and advocacy, but those outcomes may vary by advertising objective. Originality/value Given the uniqueness of the data set, the findings contribute to native advertising literature and to the literature on digital consumer engagement with advertising in social media. The study also provides empirical support for the efficacy of native advertising.
The era of behavioural big data has created new avenues for data science research, with many new contributions stemming from academic researchers. Yet data controlled by platforms have become increasingly difficult for academics to access. Platforms now routinely use algorithmic behaviour modification techniques to manipulate users’ behaviour, leaving academic researchers further isolated in conducting important data science and computational social science research. This isolation results from researchers’ lack of access to human behavioural data and, crucially, to both the data on machine behaviour that triggers and learns from the human data and the platform’s behaviour modification mechanisms. Given the impact of behaviour modification on individual and societal well-being, we discuss the consequences for data science knowledge creation, and encourage academic data scientists to take on new roles in producing research to promote (1) platform transparency and (2) informed public debate around the social purpose and function of digital platforms. Behavioural big data and algorithmic behaviour modification technologies controlled by commercial platforms have become difficult for academic researchers to access. Greene et al. describe barriers to academic research on such data and algorithms, and make a case for enhancing platform access and transparency.
Full-text available
The right to be forgotten has been widely discussed from a legal perspective. Courts have analyzed the existence and constitutional compatibility of the right in the national legal order of several jurisdictions around the world. However, even if the right to be forgotten is not a universally recognized right, by understanding how the law approaches tensions that arise between the right to freedom of expression and the rights to seek, impart and receive information, on one hand, and a right to be forgotten, underpinned by the rights to honor, privacy and personal data protection on the other, journalists can extract ethical guidelines that can orient them in the correct use of archival information about individuals to report on current events. This work begins by explaining how legal debates can help inform ethical discussions about journalism. Then, by exploring the legal development and justifications for the right to be forgotten and identifying key elements of this emerging right, we engage in a discussion around the use of archives and memory in journalism and then identify the elements that journalists should consider in relation to the use of archival information in their profession in a way that allows them to fulfill their journalistic duties without ignoring the legal context.
Personal technology use can significantly impact wellness. The transition to widespread remote learning, working, and socializing during the COVID-19 pandemic exacerbated society’s reliance on technology. This article presents a case study of how the authors applied their privacy scholarship to offer a responsive learning experience for students concerning the social implications of the pandemic. The article also explores the authors’ unique approach to digital wellness, which seeks to align wellness goals and habits regarding technology while placing a special emphasis on privacy, particularly information asymmetries, attention engineering, and the hidden harms of invasive data collection.
People’s activities and opinions recorded as digital traces online, especially on social media and other web-based platforms, offer increasingly informative pictures of the public. They promise to allow inferences about populations beyond the users of the platforms on which the traces are recorded, representing real potential for the social sciences and a complement to survey-based research. But the use of digital traces brings its own complexities and new error sources to the research enterprise. Recently, researchers have begun to discuss the errors that can occur when digital traces are used to learn about humans and social phenomena. This article synthesizes this discussion and proposes a systematic way to categorize potential errors, inspired by the Total Survey Error (TSE) framework developed for survey methodology. We introduce a conceptual framework to diagnose, understand, and document errors that may occur in studies based on such digital traces. While there are clear parallels to the well-known error sources in the TSE framework, the new “Total Error Framework for Digital Traces of Human Behavior on Online Platforms” (TED-On) identifies several types of error that are specific to the use of digital traces. By providing a standard vocabulary to describe these errors, the proposed framework is intended to advance communication and research about using digital traces in scientific social research.
Full-text available
Communication and media researchers, as well as scholars from related disciplines, are increasingly asking how social practices are measured, analyzed, and represented through data or digital traces. However, there is little research on the various ways in which communication and media research itself uses such nonscientific usage data. This paper shows that usage data from technology, data, market research, and media companies are in many cases opaque and often insufficiently scrutinized and contextualized by scholars. There is too little critical discussion and analysis as to what different metrics and rankings reveal and what commercial interests and strategies are driving them. The analysis at hand identifies and discusses five problematic ways of dealing with nonscientific usage data: A) lacking contextualization of what usage metrics and rankings indicate, B) lacking contextualization of rankings as strategic business tools, C) unsubstantiated inference from usage data to macro phenomena, D) uncritical adoption of superlatives and generalizations created by vested interests, and E) use of estimated numbers as measured data. The analysis of these problematics sheds a critical light on communication and media research, the partial lack of ensuring scientific quality, and recent developments in science. It also opens up new questions and perspectives with which communication and media research can make an important contribution. A critical look at usage data is essential, especially in societies in which usage figures and rankings are omnipresent and powerful corporations largely control access to data. [for further reading, see also the extended English abstract, pp. 212–221]
Full-text available
Research has prominently assumed that social media and web portals that aggregate news restrict the diversity of content that users are exposed to by tailoring news diets toward the users’ preferences. In our empirical test of this argument, we apply a random-effects within–between model to two large representative datasets of individual web browsing histories. This approach allows us to better encapsulate the effects of social media and other intermediaries on news exposure. We find strong evidence that intermediaries foster more varied online news diets. The results call into question fears about the vanishing potential for incidental news exposure in digital media environments.
Full-text available
As the logic of data-driven metrification reconfigures various realms of social and economic life, cultural workers—from journalists and musicians to photographers and social media content creators—are pursuing online visibility in earnest. Despite workers’ patterned deployment of search engine optimization, reciprocal linking, and automated engagement-boosting, tech companies routinely denigrate such practices as gaming the system. This article critically probes discourses and practices of so-called system-gaming by analyzing three key moments when platforms accused cultural producers of algorithmic manipulation. Empirically, we draw upon textual analyses of news articles ( n = 105) and user guidelines published by Google, Facebook, and Instagram. Our findings suggest that the line between what platforms deem illegitimate algorithmic manipulation and legitimate strategy is nebulous and largely reflective of their material interests. However, the language used to invoke this distinction is strongly normative, condemning “system gamers” as morally bankrupt, while casting platform companies as neutral actors working to uphold the ideals of authenticity and integrity. We term this dynamic “platform paternalism” and conclude that gaming accusations constitute an important mechanism through which platforms legitimate their power and authority, to the detriment of less well-established cultural producers.
Full-text available
It is widely believed that the spread of data extraction technologies on the Internet has led to the erosion of traditional professional content providers and the transformation of the online media ecosystem. To investigate this shift in media ecology, we conduct relational analyses of actual user behavior, departing from existing research that primarily focuses on business institutions and designs of technology. We assess the prevalence of the data extraction business model by grouping websites along two architectural traits that afford data extraction – user content generation and curation – and analyzing how some website architectures get privileged in the web use ecosystem. Since data extraction is relational, we advocate a network measure to capture shared usage in addition to individual popularity of websites. Our analyses of world’s 850 most popular websites in 2009, 2011, and 2013 reveal that data extraction fostered a two-tier hierarchical web use ecosystem, marked by interdependence between professional content producers and data extractors. Our study thereby shows that the dynamics in play are more complicated than what is captured by explanations centered on either capabilities of platform giants or the decline of traditional journalism and media organizations.
The widespread concerns about the misuses and negative effects of social media platforms have prompted a range of governance responses, including preliminary efforts toward self-regulatory models. Building upon these initiatives, this paper looks to the self-regulation of the audience measurement industry as a possible template for the self-regulation of social media. This article explores the parallels between audience measurement systems and social media platforms; reviews the self-regulatory apparatus in place for the audience measurement industry; and, considers the lessons that the self-regulation of audience measurement might offer to the design and implementation of self-regulatory approaches to social media.
With half the world now online, a handful of websites dominate globally. Yet little is known about the homogeneity or geographical distinctness of global web use patterns. Focusing beyond popular sites, we inquired into how and why countries are similar in their web use patterns, developing a framework drawing on the literatures on media globalization, as well as Internet geographies. To compute similarities in web use between countries, we utilized an algorithm that considered both ranking positions and overlap counts on ranked lists of the 100 most popular websites for 174 countries, totaling 6,252 unique websites. Findings from a network analysis and from regressions suggest that countries with similar languages and shared borders, as well as those vastly different in their Internet market sizes, tend to have similar web use patterns. Neither are countries particularly similar to the US in web use nor does the prevalence of English speakers have an influence.
Current models of data access in social media research offer clear benefits, but are also fraught in a number of ways, including by posing risks to user privacy, being constrained in terms of reliability and reproducibility of results, and incentivizing questionable and in some cases unethical research practices. I argue that partnerships between academics and industry represent one potential option for improving this situation. While no panacea, such arrangements may be able to contribute to a more rules-based and less anarchic situation in social media research, placing greater emphasis on preserving user privacy and the reproducibility of results, rather than mainly on compiling large data sets. Due to a number of recent shifts, not just in research, but in the public discourse surrounding social media platforms and user data, we are entering an era of increased institutionalization and standardization in the study of online communication. This new environment appears poised to replace the ‘Wild West of social media research’ that we have witnessed in the past, in which academics compile huge troughs of data with few constraints, not always acting in the public’s best interest.
Disinformation campaigns continue to thrive online, despite social media companies’ efforts at identifying and culling manipulation on their platforms. Framing these manipulation tactics as ‘coordinated inauthentic behavior,’ major platforms have banned culprits and deleted the evidence of their actions from social activity streams, making independent assessment and auditing impossible. While researchers, journalists, and civil society groups use multiple methods for discovering and tracking disinformation, platforms began to publish highly curated data archives of disinformation in 2016. When platform companies reframe manipulation campaigns, however, they downplay the importance of their products in spreading disinformation. We propose to treat social media metadata as a boundary object that supports research across platforms and use metadata as an entry point for investigating manipulation campaigns. We illustrate how platform companies’ responses to disinformation campaigns are at odds with the interests of researchers, civil society, policy-makers, and journalists, limiting the capacity to audit the role that platforms play in political discourse. To show how platforms’ data archives of ‘coordinated inauthentic behavior’ prevent researchers from examining the contexts of manipulation, we present two case studies of disinformation campaigns related to the Black Lives Matter Movement. We demonstrate how data craft – the exploitation of metrics, metadata, and recommendation engines – played a prominent role attracting audiences to these disinformation campaigns. Additionally, we offer some investigative techniques for researchers to employ data craft in their own research of the disinformation. We conclude by proposing new avenues for research for the field of Critical Internet Studies.
In the aftermath of the Cambridge Analytica controversy, social media platform providers such as Facebook and Twitter have severely restricted access to platform data via their Application Programming Interfaces (APIs). This has had a particularly critical effect on the ability of social media researchers to investigate phenomena such as abuse, hate speech, trolling, and disinformation campaigns, and to hold the platforms to account for the role that their affordances and policies might play in facilitating such dysfunction. Alternative data access frameworks, such as Facebook’s partnership with the controversial Social Science One initiative, represent an insufficient replacement for fully functional APIs, and the platform providers’ actions in responding to the Cambridge Analytica scandal raise suspicions that they have instrumentalised it to actively frustrate critical, independent, public interest scrutiny by scholars. Building on a critical review of Facebook’s public statements through its own platforms and the mainstream media, and of the scholarly responses these have drawn, this article outlines the societal implications of the ‘APIcalypse’, and reviews potential options for scholars in responding to it.