Alan Mislove's research while affiliated with Northeastern University and other places

Publications (126)

Preprint
Full-text available
Targeted advertising remains an important part of the free web browsing experience, where advertisers' targeting and personalization algorithms together find the most relevant audience for millions of ads every day. However, given the wide use of advertising, this also enables using ads as a vehicle for problematic content, such as scams or clickba...
Chapter
Machines powered by artificial intelligence increasingly mediate our social, cultural, economic, and political interactions. This chapter frames and surveys the emerging interdisciplinary field of machine behaviour: the scientific study of behaviour exhibited by intelligent machines. It outlines the key research themes, questions, and landmark rese...
Preprint
Full-text available
Most public blockchain protocols, including the popular Bitcoin and Ethereum blockchains, do not formally specify the order in which miners should select transactions from the pool of pending (or uncommitted) transactions for inclusion in the blockchain. Over the years, informal conventions or "norms" for transaction ordering have, however, emerged...
Chapter
Blockchain protocols’ primary security goal is consensus: one version of the global ledger that everyone in the network agrees on. Their proofs of security depend on assumptions on how well their peer-to-peer (P2P) overlay networks operate. Yet, surprisingly, little is understood about what factors influence the P2P network properties. In this work...
Article
Every second, the thoughts and feelings of millions of people across the world are recorded in the form of 140-character tweets using Twitter. However, despite the enormous potential presented by this remarkable data source, we still do not have an understanding of the Twitter population itself: Who are the Twitter users? How representative of the...
Article
User tracking has become ubiquitous practice on the Web, allowing services to recommend behaviorally targeted content to users. In this article, we design Alibi, a system that utilizes such readily available personalized content, generated by recommendation engines in real time, as a means to tame Sybil attacks. In particular, by using ads and othe...
Chapter
The Transport Layer Security (TLS) Public Key Infrastructure (PKI) is essential to the security and privacy of users on the Internet. Despite its importance, prior work from the mid-2010s has shown that mismanagement of the TLS PKI often led to weakened security guarantees, such as compromised certificates going unrevoked and many internet devices...
Preprint
Full-text available
Today, algorithmic models are shaping important decisions in domains such as credit, employment, or criminal justice. At the same time, these algorithms have been shown to have discriminatory effects. Some organizations have tried to mitigate these effects by removing demographic features from an algorithm's inputs. If an algorithm is not provided...
Preprint
Political campaigns are increasingly turning to digital advertising to reach voters. These platforms empower advertisers to target messages to platform users with great precision, including through inferences about those users' political affiliations. However, prior work has shown that platforms' ad delivery algorithms can selectively deliver ads w...
Article
The enormous financial success of online advertising platforms is partially due to the precise targeting features they offer. Although researchers and journalists have found many ways that advertisers can target---or exclude---particular groups of users seeing their ads, comparatively little attention has been paid to the implications of the platfo...
Conference Paper
The public key infrastructure (PKI) provides the fundamental property of authentication: the means by which users can know with whom they are communicating online. The PKI ensures end-to-end authenticity insofar as it verifies a chain of certificates, but the true final step in end-to-end authentication comes when the user verifies that the website...
Conference Paper
Despite its critical role in Internet connectivity, the Border Gateway Protocol (BGP) remains highly vulnerable to attacks such as prefix hijacking, where an Autonomous System (AS) announces routes for IP space it does not control. To address this issue, the Resource Public Key Infrastructure (RPKI) was developed starting in 2008, with deployment b...
Conference Paper
Net neutrality has been the subject of considerable public debate over the past decade. Despite the potential impact on content providers and users, there is currently a lack of tools or data for stakeholders to independently audit the net neutrality policies of network providers. In this work, we address this issue by conducting a one-year study o...
Article
Google's Quick UDP Internet Connections (QUIC) protocol, which implements TCP-like properties at the application layer atop a UDP transport, is now used by the vast majority of Chrome clients accessing Google properties but has no formal state machine specification, limited analysis, and ad-hoc evaluations based on snapshots of the protocol impleme...
Conference Paper
In this work, we introduce a novel metric for auditing group fairness in ranked lists. Our approach offers two benefits compared to the state of the art. First, we offer a blueprint for modeling of user attention. Rather than assuming a logarithmic loss in importance as a function of the rank, we can account for varying user behaviors through param...
Conference Paper
Data brokers such as Acxiom and Experian are in the business of collecting and selling data on people; the data they sell is commonly used to feed marketing as well as political campaigns. Despite the ongoing privacy debate, there is still very limited visibility into data collection by data brokers. Recently, however, online advertising services s...
Article
The Domain Name System (DNS) is the naming system on the Internet. With the DNS Security Extensions (DNSSEC) operators can protect the authenticity of their domain using public key cryptography. DNSSEC, however, can be difficult to configure and maintain: operators need to replace keys to upgrade their algorithm, react to security breaches or follo...
Preprint
The enormous financial success of online advertising platforms is partially due to the precise targeting features they offer. Although researchers and journalists have found many ways that advertisers can target---or exclude---particular groups of users seeing their ads, comparatively little attention has been paid to the implications of the platfo...
Article
Full-text available
Machines powered by artificial intelligence increasingly mediate our social, cultural, economic and political interactions. Understanding the behaviour of artificial intelligence systems is essential to our ability to control their actions, reap their benefits and minimize their harms. Here we argue that this necessitates a broad scientific researc...
Preprint
Full-text available
In this work we introduce a novel metric for verifying group fairness in ranked lists. Our approach relies on measuring the amount of attention given to members of a protected group and comparing it to that group's representation in the investigated population. It offers two major developments compared to the state of the art. First, rather than as...
Conference Paper
Full-text available
The popularity of online advertising-now with aggregate revenues in the hundreds of billions of dollars each year-is strongly driven by targeting, or the ability of an advertising platform to help an advertiser select exactly which users should see their ad. To enable such targeting, advertising platforms routinely collect detailed data on users, a...
Conference Paper
Online advertising platforms such as those of Facebook and Google collect detailed data about users, which they leverage to allow advertisers to target ads to users based on various pieces of user information. While most advertising platforms have transparency mechanisms in place to reveal this collected information to users, these often present an...
Conference Paper
Ethereum is the second most valuable cryptocurrency today, with a current market cap of over $68B. What sets Ethereum apart from other cryptocurrencies is that it uses the blockchain to not only store a record of transactions, but also smart contracts and a history of calls made to those contracts. Thus, Ethereum represents a new form of distribute...
Conference Paper
TLS, the de facto standard protocol for securing communications over the Internet, relies on a hierarchy of certificates that bind names to public keys. Naturally, ensuring that the communicating parties are using only valid certificates is a necessary first step in order to benefit from the security of TLS. To this end, most certificates and clien...
Article
Transportation network companies (TNCs) provide vehicle-for-hire services. They are distinguished from taxis primarily by the presumption that vehicles are privately owned by drivers. Unlike taxis, which must hold one of approximately 1,800 medallions licensed by the San Francisco Municipal Transportation Agency (SFMTA) to operate in San Francisco,...
Conference Paper
In this work, we propose an automated method to find attacks against TCP congestion control implementations that combines the generality of implementation-agnostic fuzzing with the precision of runtime analysis. It uses a model-guided approach to generate abstract attack strategies by leveraging a state machine model of congestion control to find v...
Conference Paper
Web security has been and remains a highly relevant field of security research, which has seen many additional features standardiazed at IETF over the past years. This talk covers two papers, which in sum provide a conprehensive survey of quantity and quality of adoption of such new security extensions by HTTPS web servers. The protocols covered ar...
Article
Shaken by severe compromises, the Web’s Public Key Infrastructure has seen the addition of several security mechanisms over recent years. One such mechanism is the Certification Authority Authorization (CAA) DNS record, that gives domain name holders control over which Certification Authorities (CAs) may issue certificates for their domain. First d...
Conference Paper
Ridesharing services such as Uber and Lyft have become an important part of the Vehicle For Hire (VFH) market, which used to be dominated by taxis. Unfortunately, ridesharing services are not required to share data like taxi services, which has made it challenging to compare the competitive dynamics of these services, or assess their impact on citi...
Article
A properly managed public key infrastructure (PKI) is critical to ensure secure communication on the Internet. Surprisingly, some of the most important administrative steps---in particular, reissuing new X.509 certificates and revoking old ones---are manual and remained unstudied, largely because it is difficult to measure these manual processes at...
Article
Full-text available
Recently, online targeted advertising platforms like Facebook have been criticized for allowing advertisers to discriminate against users belonging to sensitive groups, i.e., to exclude users belonging to a certain race or gender from receiving their ads. Such criticisms have led, for instance, Facebook to disallow the use of attributes such as eth...
Conference Paper
As blockchain technologies and cryptocurrencies increase in popularity, their decentralization poses unique challenges in network partitions. In traditional distributed systems, network partitions are generally a result of bugs or connectivity failures; the typical goal of the system designer is to automatically recover from such issues as seamless...
Conference Paper
The Domain Name System (DNS) provides a scalable, flexible name resolution service. Unfortunately, its unauthenticated architecture has become the basis for many security attacks. To address this, DNS Security Extensions (DNSSEC) were introduced in 1997. DNSSEC's deployment requires support from the top-level domain (TLD) registries and registrars,...
Conference Paper
Google's QUIC protocol, which implements TCP-like properties at the application layer atop a UDP transport, is now used by the vast majority of Chrome clients accessing Google properties but has no formal state machine specification, limited analysis, and ad-hoc evaluations based on snapshots of the protocol implementation in a small number of envi...
Conference Paper
Middleboxes implement a variety of network management policies (e.g., prioritizing or blocking traffic) in their networks. While such policies can be beneficial (e.g., blocking malware) they also raise issues of network neutrality and freedom of speech when used for application-specific differentiation and censorship. There is a poor understanding...
Conference Paper
The Domain Name System (DNS) is part of the core of the Internet. Over the past decade, much-needed security features were added to this protocol, with the introduction of the DNS Security Extensions. DNSSEC adds authenticity and integrity to the protocol using digital signatures, and turns the DNS into a public key infrastructure (PKI). At the top...
Article
NLP tasks are often limited by scarcity of manually annotated data. In social media sentiment analysis and related tasks, researchers have therefore used binarized emoticons and specific hashtags as forms of distant supervision. Our paper shows that by extending the distant supervision to a more diverse set of noisy labels, the models can learn ric...
Article
Full-text available
Web search is an integral part of our daily lives. Recently, there has been a trend of personalization in Web search, where different users receive different results for the same search query. The increasing level of personalization is leading to concerns about Filter Bubble effects, where certain users are simply unable to access information that...
Conference Paper
Online freelancing marketplaces have grown quickly in recent years. In theory, these sites offer workers the ability to earn money without the obligations and potential social biases associated with traditional employment frameworks. In this paper, we study whether two prominent online freelance marketplaces - TaskRabbit and Fiverr - are impacted b...
Conference Paper
SSL and TLS are used to secure the most commonly used Internet protocols. As a result, the ecosystem of SSL certificates has been thoroughly studied, leading to a broad understanding of the strengths and weaknesses of the certificates accepted by most web browsers. Prior work has naturally focused almost exclusively on "valid" certificates--those t...
Conference Paper
A variety of network management practices, from bandwidth management to zero-rating, use policies that apply selectively to different categories of Internet traffic (e.g., video, P2P, VoIP). These policies are implemented by middleboxes that must, in real time, assign traffic to a category using a classifier. Despite their important implications fo...
Conference Paper
Detecting violations of application-level end-to-end connectivity on the Internet is of significant interest to researchers and end users; recent studies have revealed cases of HTTP ad injection and HTTPS man-in-the-middle attacks. Unfortunately, detecting such end-to-end violations at scale remains difficult, as it generally requires having the co...
Conference Paper
The semantics of online authentication in the web are rather straightforward: if Alice has a certificate binding Bob's name to a public key, and if a remote entity can prove knowledge of Bob's private key, then (barring key compromise) that remote entity must be Bob. However, in reality, many websites' and the majority of the most popular ones-are...
Conference Paper
The popularity of mobile devices for ubiquitous Internet access has led to exploding demand for relatively scarce cellular bandwidth. As a result, cellular operators increasingly turn to creative ways to manage their customers’ demand on capacity, using traffic shaping and transcoding and zero-rating. With zero-rating, Internet Service Providers (I...
Conference Paper
Cloud computing has evolved to meet user demands, from arbitrary VMs offered by IaaS to the narrow application interfaces of PaaS. Unfortunately, there exists an intermediate point that is not well met by today's offerings: users who wish to run arbitrary, already available binaries (as opposed to rewriting their own application for a PaaS) yet exp...
Conference Paper
The rise of e-commerce has unlocked practical applications for algorithmic pricing (also called dynamic pricing algorithms), where sellers set prices using computer algorithms. Travel websites and large, well known e-retailers have already adopted algorithmic pricing strategies, but the tools and techniques are now available to small-scale sellers...
Conference Paper
Popular social and e-commerce sites increasingly rely on crowd computing to rate and rank content, users, products and businesses. Today, attackers who create fake (Sybil) identities can easily tamper with these computations. Existing defenses that largely focus on detecting individual Sybil identities have a fundamental limitation: Adaptive attack...
Conference Paper
Full-text available
Users today access a multitude of online services---among the most popular of which are online social networks (OSNs)---via both web sites and dedicated mobile applications (apps), using a range of devices (traditional PCs, tablets, and smartphones) that are connected via a variety of networks. The resulting infrastructure makes these services conv...
Conference Paper
Full-text available
Traffic differentiation---giving better (or worse) performance to certain classes of Internet traffic---is a well-known but poorly understood traffic management policy. There is active discussion on whether and how ISPs should be allowed to differentiate Internet traffic, but little data about current practices to inform this discussion. Previous w...
Conference Paper
Recently, Uber has emerged as a leader in the "sharing economy". Uber is a "ride sharing" service that matches willing drivers with customers looking for rides. However, unlike other open marketplaces (e.g., AirBnB), Uber is a black-box: they do not provide data about supply or demand, and prices are set dynamically by an opaque "surge pricing" alg...
Conference Paper
Full-text available
To cope with the immense amount of content on the web, search engines often use complex algorithms to personalize search results for individual users. However, personalization of search results has led to worries about the Filter Bubble Effect, where the personalization algorithm decides that some useful information is irrelevant to the user, and t...
Conference Paper
Knowing the physical location of a mobile device is crucial for a number of context-aware applications. This information is usually obtained using the Global Positioning System (GPS), or by calculating the position based on proximity of WiFi access points with known location (where the position of the access points is stored in a database at a cent...
Conference Paper
Full-text available
Critical to the security of any public key infrastructure (PKI) is the ability to revoke previously issued certificates. While the overall SSL ecosystem is well-studied, the frequency with which certificates are revoked and the circumstances under which clients (e.g., browsers) check whether certificates are revoked are still not well-understood. I...
Article
Full-text available
The past two decades have seen an upsurge of interest in the collective behaviors of complex systems composed of many agents entrained to each other and to external events. In this paper, we extend the concept of entrainment to the dynamics of human collective attention. We conducted a detailed investigation of the unfolding of human entrainment—as...
Article
Today, many e-commerce websites personalize their content, including Netflix (movie recommendations), Amazon (product suggestions), and Yelp (business reviews). In many cases, personalization provides advantages for users: for example, when a user searches for an ambiguous query such as ``router,'' Amazon may be able to suggest the woodworking tool...
Conference Paper
Central to the secure operation of a public key infrastructure (PKI) is the ability to revoke certificates. While much of users' security rests on this process taking place quickly, in practice, revocation typically requires a human to decide to reissue a new certificate and revoke the old one. Thus, having a proper understanding of how often syste...
Article
Full-text available
Advertising is ubiquitous on the Web; numerous ad networks serve billions of ads daily via keyword or search term auctions. Recently, online social networks (OSNs) such as Facebook have created site-specific ad services that differ from traditional ad networks by letting advertisers bid on users rather than keywords. With Facebook's annual ad reven...
Article
Full-text available
Not all of the over one billion users of online social networks (OSNs) are equally valuable to the OSNs. The current business model of monetizing advertisements targeted to users does not appear to be based on any visible grouping of the users. The primary metrics remain CPM (cost per mille-i.e., thousand impressions) and CPC (cost per click) of ad...
Conference Paper
Full-text available
The goal of this research is to detect traffic differentiation in cellular data networks. We define service differentiation as any attempt to change the performance of network traffic traversing an ISP's boundaries. ISPs may implement differentiation policies for a number of reasons, including load balancing, bandwidth management, or business reaso...
Article
Full-text available
The microblogging site Twitter is now one of the most popular Web destinations. Due to the relative ease of data access, there has been significant research based on Twitter data, ranging from measuring the spread of ideas through society to predicting the behavior of real-world phenomena such as the stock market. Unfortunately, relatively little w...
Article
Full-text available
Today, it is the norm for online social (OSN) users to have accounts on multiple services. For example, a recent study showed that 34% of all Twitter users also use Pinterest. This situation leads to interesting questions such as: Are the activities that users perform on each site disjoint Alternatively, if users perform the same actions on multipl...
Patent
Full-text available
In one embodiment, program code is added to a social network's web pages or site such that the content a first user accesses is locally stored at the first user's system. When another user, who is a friend of the first user, as defined by the social networking site, browses to that same content, the program code fetches it from the first user, inst...
Conference Paper
Full-text available
Online social network (OSN) users upload millions of pieces of content to share with others every day. While a significant portion of this content is benign (and is typically shared with all friends or all OSN users), there are certain pieces of content that are highly privacy sensitive. Sharing such sensitive content raises significant privacy con...
Conference Paper
With the increasing popularity of Web-based services, users today have access to a broad range of free sites, including social networking, microblogging, and content sharing sites. In order to offer a service for free, service providers typically monetize user content, selling results to third parties such as advertisers. As a result, users have li...
Conference Paper
Full-text available
Online content ratings services allow users to find and share content ranging from news articles (Digg) to videos (YouTube) to businesses (Yelp). Generally, these sites allow users to create accounts, declare friendships, upload and rate content, and locate new content by leveraging the aggregated ratings of others. These services are becoming incr...