Arvind Narayanan's research while affiliated with Princeton University and other places
What is this page?
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
Publications (67)
We collect and analyze a corpus of more than 300,000 political emails sent during the 2020 US election cycle. These emails were sent by over 3000 political campaigns and organizations including federal and state level candidates as well as Political Action Committees. We find that in this corpus, manipulative tactics—techniques using some level of...
Blockchain analysis is essential for understanding how cryptocurrencies like Bitcoin are used in practice, and address clustering is a cornerstone of blockchain analysis. However, current techniques rely on heuristics that have not been rigorously evaluated or optimized. In this paper, we tackle several challenges of change address identification a...
Online platforms play an increasingly important role in shaping democracy by influencing the distribution of political information to the electorate. In recent years, political campaigns have spent heavily on the platforms' algorithmic tools to target voters with online advertising. While the public interest in understanding how platforms perform t...
Concerns about privacy, bias, and harmful applications have shone a light on the ethics of machine learning datasets, even leading to the retraction of prominent datasets including DukeMTMC, MS-Celeb-1M, TinyImages, and VGGFace2. In response, the machine learning community has called for higher ethical standards, transparency efforts, and technical...
Simulation can enable the study of recommender system (RS) evolution while circumventing many of the issues of empirical longitudinal studies; simulations are comparatively easier to implement, are highly controlled, and pose no ethical risk to human participants. How simulation can best contribute to scientific insight about RS alongside qualitati...
Simulation has emerged as a popular method to study the long-term societal consequences of recommender systems. This approach allows researchers to specify their theoretical model explicitly and observe the evolution of system-level outcomes over time. However, performing simulation-based studies often requires researchers to build their own simula...
Blockchain analysis is essential for understanding how cryptocurrencies like Bitcoin are used in practice, and address clustering is a cornerstone of blockchain analysis. However, current techniques rely on heuristics that have not been rigorously evaluated or optimized. In this paper, we tackle several challenges of change address identification a...
Universities have been forced to rely on remote educational technology to facilitate the rapid shift to online learning. In doing so, they acquire new risks of security vulnerabilities and privacy violations. To help universities navigate this landscape, we develop a model that describes the actors, incentives, and risks, informed by surveying 105...
We investigate data exfiltration by third-party scripts directly embedded on web pages. Specifically, we study three attacks: misuse of browsers’ internal login managers, social data exfiltration, and whole-DOM exfiltration. Although the possibility of these attacks was well known, we provide the first empirical evidence based on measurements of 30...
The evolution of tricky user interfaces.
Automated analysis of privacy policies has proved a fruitful research direction, with developments such as automated policy summarization, question answering systems, and compliance detection. So far, prior research has been limited to analysis of privacy policies from a single point in time or from short spans of time, as researchers did not have...
Dark patterns are an abuse of the tremendous power that designers hold in their hands. As public awareness of dark patterns grows, so does the potential fallout. Journalists and academics have been scrutinizing dark patterns, and the backlash from these exposures can destroy brand reputations and bring companies under the lenses of regulators. Desi...
Dark patterns are user interface design choices that benefit an online service by coercing, steering, or deceiving users into making unintended and potentially harmful decisions. We present automated techniques that enable experts to identify dark patterns on a large set of websites. Using these techniques, we study shopping websites, which often u...
The number of Internet-connected TV devices has grown significantly in recent years, especially Over-the-Top ("OTT") streaming devices, such as Roku TV and Amazon Fire TV. OTT devices offer an alternative to multi-channel television subscription services, and are often monetized through behavioral advertising. To shed light on the privacy practices...
Dark patterns are user interface design choices that benefit an online service by coercing, steering, or deceiving users into making unintended and potentially harmful decisions. We present automated techniques that enable experts to identify dark patterns on a large set of websites. Using these techniques, we study shopping websites, which often u...
The proliferation of smart home Internet of things (IoT) devices presents unprecedented challenges for preserving privacy within the home. In this paper, we demonstrate that a passive network observer (e.g., an Internet service provider) can infer private in-home activities by analyzing Internet traffic from commercially available smart home device...
The security of most existing cryptocurrencies is based on a concept called Proof-of-Work, in which users must solve a computationally hard cryptopuzzle to authorize transactions ("one unit of computation, one vote''). This leads to enormous expenditure on hardware and electricity in order to collect the rewards associated with transaction authoriz...
A bstract
Consumer genetics databases hold dense genotypes of millions of people, and the number is growing quickly [1] [2]. In 2018, law enforcement agencies began using such databases to identify anonymous DNA via long-range familial searches. We show that this technique is far more powerful if combined with a genealogical database of the type co...
The proliferation of smart home Internet of Things (IoT) devices presents unprecedented challenges for preserving privacy within the home. In this paper, we demonstrate that a passive network observer (e.g., an Internet service provider) can infer private in-home activities by analyzing Internet traffic from commercially available smart home device...
The security of most existing cryptocurrencies is based on a concept called Proof-of-Work, in which users must solve a computationally hard cryptopuzzle to authorize transactions (`one unit of computation, one vote'). This leads to enormous expenditure on hardware and electricity in order to collect the rewards associated with transaction authoriza...
Users are often ill-equipped to identify online advertisements that masquerade as non-advertising content. Because such hidden advertisements can mislead and harm users, the Federal Trade Commission (FTC) requires all advertising content to be adequately disclosed. In this paper, we examined disclosures within affiliate marketing, an endorsement-ba...
In this paper, we present two web-based attacks against local IoT devices that any malicious web page or third-party script can perform, even when the devices are behind NATs. In our attack scenario, a victim visits the attacker's website, which contains a malicious script that communicates with IoT devices on the local network that have open HTTP...
Monero is a privacy-centric cryptocurrency that allows users to obscure their transactions by including chaff coins, called “mixins,” along with the actual coins they spend. In this paper, we empirically evaluate two weaknesses in Monero’s mixin sampling strategy. First, about 62% of transaction inputs with one or more mixins are vulnerable to “cha...
Blockchain technology is assembled from pieces that have long pedigrees in the academic literature, such as linked timestamping, consensus, and proof of work. In this tutorial, I'll begin by summarizing these components and how they fit together in Bitcoin's blockchain design. Then I'll present abstract models of blockchains; such abstractions help...
While disclosures relating to various forms of Internet advertising are well established and follow specific formats, endorsement marketing disclosures are often open-ended in nature and written by individual publishers. Because such marketing often appears as part of publishers' actual content, ensuring that it is adequately disclosed is critical...
We show that the simple act of viewing emails contains privacy pitfalls for the unwary. We assembled a corpus of commercial mailing-list emails, and find a network of hundreds of third parties that track email recipients via methods such as embedded pixels. About 30% of emails leak the recipient’s email address to one or more of these third parties...
IF YOU HAVE read about bitcoin in the press and have some familiarity with academic research in the field of cryptography, you might reasonably come away with the following impression: Several decades' worth of research on digital cash, beginning with David Chaum,10,12 did not lead to commercial success because it required a centralized, bank-like...
Analysis of blockchain data is useful for both scientific research and commercial applications. We present BlockSci, an open-source software platform for blockchain analysis. BlockSci is versatile in its support for different blockchains and analysis tasks. It incorporates an in-memory, analytical (rather than transactional) database, making it sev...
The growing market for smart home IoT devices promises new conveniences for consumers while presenting new challenges for preserving privacy within the home. Many smart home devices have always-on sensors that capture users' offline activities in their living spaces and transmit information about these activities on the Internet. In this paper, we...
We show how third-party web trackers can deanonymize users of cryptocurrencies. We present two distinct but complementary attacks. On most shopping websites, third party trackers receive information about user purchases for purposes of advertising and analytics. We show that, if the user pays using a cryptocurrency, trackers typically possess enoug...
We’ve seen repeatedly that ideas in the research literature can be gradually forgotten or lie unappreciated, especially if they are ahead of their time, even in popular areas of research. Both practitioners and academics would do well to revisit old ideas to glean insights for present systems. Bitcoin was unusual and successful not because it was o...
In the cryptographic currency Bitcoin, all transactions are recorded in the blockchain - a public, global, and immutable ledger. Because transactions are public, Bitcoin and its users employ obfuscation to maintain a degree of financial privacy. Critically, and in contrast to typical uses of obfuscation, in Bitcoin obfuscation is not aimed against...
We present a systematic study of ad blocking - and the associated "arms race" - as a security problem. We model ad blocking as a state space with four states and six state transitions, which correspond to techniques that can be deployed by either publishers or ad blockers. We argue that this is a complete model of the system. We propose several new...
When you browse the web, hidden “third parties” collect a large amount of data about your behavior. This data feeds algorithms to target ads to you, tailor your news recommendations, and sometimes vary prices of online products. The network of trackers comprises hundreds of entities, but consumers have little awareness of its pervasiveness and soph...
Machine learning is a means to derive artificial intelligence by discovering patterns in existing data. Here we show that applying machine learning to ordinary human language results in human-like semantic biases. We replicate a spectrum of known biases, as measured by the Implicit Association Test, using a widely used, purely statistical machine-l...
Monero is a privacy-centric cryptocurrency that allows users to obscure their transaction graph by including chaff coins, called "mixins," along with the actual coins they spend. In this report, we empirically evaluate two weak- nesses in Monero's mixin sampling strategy. First, about 62% of transaction inputs with one or more mixins are vulnerable...
First, Arvind Narayanan and Andrew Miller, co-authors of the increasingly popular open-access Princeton Bitcoin textbook, provide an overview of ongoing research in cryptocurrencies. Second, Song Han provides an overview of hardware trends related to another long-studied academic problem that has recently seen an explosion in popularity: deep learn...
Bitcoin provides two incentives for miners: block rewards and transaction fees. The former accounts for the vast majority of miner revenues at the beginning of the system, but it is expected to transition to the latter as the block rewards dwindle. There has been an implicit belief that whether miners are paid by block rewards or transaction fees d...
We present the largest and most detailed measurement of online tracking conducted to date, based on a crawl of the top 1 million websites. We make 15 types of measurements on each site, including stateful (cookie-based) and stateless (fingerprinting-based) tracking, the effect of browser privacy tools, and the exchange of tracking data between diff...
Machines learn what people know implicitly
AlphaGo has demonstrated that a machine can learn how to do things that people spend many years of concentrated study learning, and it can rapidly learn how to do them better than any human can. Caliskan et al. now show that machines can learn word associations from written texts and that these association...
While threshold signature schemes have been presented before, there has never been an optimal threshold signature algorithm for DSA. The properties of DSA make it quite challenging to build a threshold version. In this paper, we present a threshold DSA scheme that is efficient and optimal. We also present a compelling application to use our scheme:...
Once released to the public, data cannot be taken back. As time passes, data analytic techniques improve and additional datasets become public that can reveal information about the original data. It follows that released data will get increasingly vulnerable to re-identification—unless methods with provable privacy properties are used for the data...
Expert-curated guides to the best of CS research.
The ability to identify authors of computer programs based on their coding
style is a direct threat to the privacy and anonymity of programmers. Previous
work has examined attribution of authors from both source code and compiled
binaries, and found that while source code can be attributed with very high
accuracy, the attribution of executable bina...
Bit coin has emerged as the most successful cryptographic currency in history. Within two years of its quiet launch in 2009, Bit coin grew to comprise billions of dollars of economic value despite only cursory analysis of the system's design. Since then a growing literature has identified hidden-but-important properties of the system, discovered at...
We study the ability of a passive eavesdropper to leverage "third-party" HTTP tracking cookies for mass surveillance. If two web pages embed the same tracker which tags the browser with a unique cookie, then the adversary can link visits to those pages from the same user (i.e., browser instance) even if the user's IP address varies. Further, many p...
We present the first large-scale studies of three advanced web tracking mechanisms - canvas fingerprinting, evercookies and use of "cookie syncing" in conjunction with evercookies. Canvas fingerprinting, a recently developed form of browser fingerprinting, has not previously been reported in the wild; our results show that over 5% of the top 100,00...
We propose Mixcoin, a protocol to facilitate anonymous payments in Bitcoin and similar cryptocurrencies. We build on the emergent phenomenon of currency mixes, adding an accountability mechanism to expose theft. We demonstrate that incentives of mixes and clients can be aligned to ensure that rational mixes will not steal. Our scheme is efficient a...
Embedding coverage of ethics in software engineering courses would help students draw strength and wisdom from dialogue with other future members of their profession. Without a sense of professional ethics, individuals might justify to themselves conduct that would be much more difficult to justify in front of others. Additionally, professional eth...
Despite privacy-preserving cryptography technologies' potential, they've largely failed to find commercial adoption. Reasons include people's unawareness of privacy-preserving cryptography, developers' lack of expertise, the field's complexity, economic constraints, and trust issues. View part 1 of this article (from the March/April 2013 issue) her...
Perceptual, "context-aware" applications that observe their environment and interact with users via cameras and other sensors are becoming ubiquitous on personal computers, mobile phones, gaming platforms, household robots, and augmented-reality devices. This raises new privacy risks. We describe the design and implementation of DARKLY, a practical...
One way to use cryptography for privacy is to tweak various systems to be privacy-preserving. But the more radical cypherpunk movement sought to wield crypto as a weapon of freedom, autonomy, and privacy that would fundamentally and inexorably reshape social, economic, and political power structures. This installment of On the Horizon primarily exa...
Many commercial websites use recommender systems to help customers locate products and content. Modern recommenders are based on collaborative filtering: they use patterns learned from users' behavior to make recommendations, usually in the form of related-items lists. The scale and complexity of these systems, along with the fact that their output...
This paper describes the winning entry to the IJCNN 2011 Social Network
Challenge run by Kaggle.com. The goal of the contest was to promote research on
real-world link prediction, and the dataset was a graph obtained by crawling
the popular Flickr social photo sharing website, with user identities scrubbed.
By de-anonymizing much of the competition...
Developing effective privacy protection technologies is a critical challenge for security and privacy research as the amount and variety of data collected about individuals increase exponentially.
Operators of online social networks are increasingly sharing potentially sensitive information about users and their relationships with advertisers, application developers, and data-mining researchers. Privacy is typically protected by anonymization, i.e., removing names, addresses, etc. We present a framework for analyzing privacy and anonymity in...
We present a new class of statistical de- anonymization attacks against high-dimensional micro-data, such as individual preferences, recommendations, transaction records and so on. Our techniques are robust to perturbation in the data and tolerate some mistakes in the adversary's background knowledge. We apply our de-anonymization methodology to th...
Obfuscation, when used as a technical term, refers to hiding information “in plain sight” inside computer code or digital data. The history of obfuscation in modern computing can be traced to two events that took place in 1976. The first was the publication of Diffie and Hellman’s seminal paper on public-key cryptography [DH76]. This paper is famou...
We present a new class of statistical de-anonymization attacks against high-dimensional micro-data, such as individual preferences, recommendations, transaction records and so on. Our techniques are robust to perturbation in the data and tolerate some mistakes in the adversary's background knowledge. We apply our de-anonymization methodology to the...
We study the problem of circuit obfuscation, i.e., transforming the circuit in a way that hides everything except its input-output behavior. Barak et al. showed that a universal obfuscator that obfuscates every circuit class cannot exist, leaving open the possibility of special-purpose obfuscators. Known positive results for obfuscation are limited...
largest online movie rental service—publicly released a dataset containing movie ratings of 500,000 Netflix subscribers. The dataset is intended to be anonymous, and all personally identifying information has been removed. We demonstrate that an attacker who knows only a little bit about an individual subscriber can easily identify this subscriber’...
We investigate whether it is possible to encrypt a database and then give it away in such a form that users can still access it, but only in a restricted way. In contrast to conventional privacy mechanisms that aim to prevent any access to individual records, we aim to restrict the set of queries that can be feasibly evaluated on the encrypted data...
Human-memorable passwords are a mainstay of computer security. To decrease vulnerability of passwords to brute-force dictionary attacks, many organizations enforce complicated password-creation rules and require that passwords include numerals and special characters. We demonstrate that as long as passwords remain human-memorable, they are vulnerab...
Citations
... Some research questions may be less concerned with overall variation in digital literacy than with individuals with the lowest levels of digital literacy; the scale was developed with these in mind. Examples include the literature on online misinformation consumption and sharing, and related concepts such as susceptibility to online scams (Mathur et al., 2020). ...
... However, public key reuse makes this question relevant even in UTXO-based graphs, as the outputs of transactions can be treated as belonging to the same account when they share a public key. Though the practice of public key reuse is discouraged [8], it often happens in practice due to ease of use [21]. ...
... Finally, we ended up with 64K privacy policy documents. For website privacy policies, we use the Princeton-Leuven Longitudinal Corpus of Privacy Policies (Amos et al., 2021). 2 The Princeton-Leuven Longitudinal Corpus of Privacy Policies contains 130K website privacy policies spanning over two decades. ...
... A relatively simple tracking technique involves capturing mouse clicks on specific areas of the stimulus (e.g., links clicked on a website) or dwell time (e.g., time spent on a website before visiting another one) [20]. More sophisticated tracking techniques may involve recording mouse trajectories, keyboard inputs, or eye movements (e.g., to generate visual heat maps) [1,29]. ...
... Every day, millions of users encounter dark patterns in information systems (IS) (Adams and Sarah 2022). Dark patterns refer to user interface design elements that benefit organizations by deceiving and manipulating users (Brignull 2010;Narayanan et al. 2020). Specifically, dark patterns are designed to infringe on user autonomy by preventing informed choices (Loewenstein et al. 2014;Sunstein 2015). ...
... Behaviour manipulation by platform operators has recently gained much attention in consumer markets and by researchers on "dark patterns" in user interfaces [41]. Conducting this manipulation requires control over information management. ...
... Dark patterns have been defined as carefully crafted user interface designs that can be used to trick users into taking particular actions [40]. These dark patterns can be defined as design choices that force, steer, or deceive users into making decisions that benefit websites, although fully informed users would not have made the same decisions [41]. ...
... Features such as Internet-based media playing and thirdparty app executing make modern TVs smarter and yet more vulnerable to security attacks and privacy intrusions. A variety of vulnerabilities have been exploited against smart TVs in recent years [1], [2], [3], [4], [5], [6], [7], [8]. In general, security threats against smart TVs can be classified into two categories: threats from Internet, and threats from programs running on smart TV OSes (e.g., Android TV OS [9]). ...
... Dummy messages: A common strategy for mitigating traffic analysis is to introduce dummy messages. Dummy messages are additional, fake messages introduced by the enforcement mechanism [3,18,19,20]. Informally, dummy messages should only serve to hide which traffic is genuine and should not not alter the semantics of a system. This does not always hold in practice, e.g., as dummy messages may introduce overhead in latency or execution time that affects time reads. ...
... However, the discussions and suspensions of doubt about the PoS protocol still exist because of security and mechanism reliability issues [68,69]. Considering the attacks (e.g., nothing at stake attack [70][71][72], long-range attacks [73][74][75], etc.), one type of work on this issue is to improve the security of the PoS-based network [57,76], which provides an idea of the solution of applying Trusted Execution Environments (TEEs) to enforce security. Meanwhile, the security and reliability can also be improved by, for example, using a trusted beacon, authorized endorsers, and common prefix [77]; therefore, the protocol can be organized, and the reliability can be enhanced by flexibly applying various supplementary options. ...