With the Internet having become an indispensable means of communication in modern society, censorship and surveillance in cyberspace are getting more prevalent. Malicious actors around the world, ranging from nation states to private organizations, are increasingly making use of technologies to not only control the free flow of information, but also eavesdrop on Internet users' online activities. Internet censorship and online surveillance have led to severe human rights violations, including the freedom of expression, the right to information, and privacy.
In this dissertation, we present two related lines of research that seek to tackle the twin problems of Internet censorship and online surveillance via an empirical lens. We show that empirical network measurement, when conducted at scale and in a longitudinal manner, is an essential approach to gain insights into (1) censors' blocking behaviors and (2) key characteristics of anti-censorship and privacy-enhancing technologies. These insights can then be used to not only aid in the development of effective censorship circumvention tools, but also help related stakeholders making informed decisions to maximize the privacy benefits of privacy-enhancing technologies.
With a focus on measuring Internet censorship, we first conduct an empirical study of the I2P anonymity network, shedding light on important properties of the network and its censorship resistance. By measuring the state of I2P censorship around the globe, we then expose numerous censorship regimes (e.g., China, Iran, Oman, Qatar, and Kuwait) where I2P are blocked by various techniques. As a result of this work, I2P has adopted DNS over HTTPS, which is one of the domain name encryption protocols introduced recently, to prevent passive snooping and make the bootstrapping process more resistant to DNS-based network filtering and surveillance.
Of the censors discovered above, we find that China is the most sophisticated one, having developed an advanced network filtering system, known as the Great Firewall (GFW). Continuing the same line of work, we have developed GFWatch, a large-scale, longitudinal measurement platform capable of testing hundreds of millions of domains daily, enabling continuous monitoring of the DNS filtering behavior of China's GFW. Data collected by GFWatch does not only cast new light on technical observations, but also timely inform the public about changes in the GFW’s blocking policy and assist other detection and circumvention efforts.
We then focus on measuring and improving the privacy benefits provided by domain name encryption technologies, such as DNS over TLS (DoT), DNS over HTTPS (DoH), and Encrypted Client Hello (ECH). Although the security benefits of these technologies are clear, their positive impact on user privacy is weakened by—the still exposed—IP address information. We assess the privacy benefits of these new technologies by considering the relationship between hostnames and their hosting IP addresses.
We show that encryption alone is not enough to protect web users' privacy. Especially when it comes to preventing nosy network observers from tracking users' browsing activities, the IP address information of remote servers being contacted is still visible, which can then be employed to infer the visited websites. Our findings help raise awareness about the remaining effort that must be undertaken by related stakeholders (i.e., website owners and hosting providers) to ensure a meaningful privacy benefit from the universal deployment of domain name encryption technologies.
Nevertheless, the benefits provided by DoT/DoH against threats ``under the recursive resolver'' come with the cost of trusting the DoT/DoH operator with the entire web browsing history of users. As a step towards mitigating the privacy concerns stemming from the exposure of all DNS resolutions of a user—effectively the user's entire domain-level browsing history—to an additional third-party entity, we proposed K-resolver, a resolution mechanism in which DNS queries are dispersed across multiple (K) DoH servers, allowing each of them to individually learn only a fraction (1/K) of a user's browsing history. Our experimental results show that our approach incurs negligible overhead while improving user privacy.
Last, but not least, given that the visibility into plaintext domain information is lost due to the introduction of domain name encryption protocols, it is important to investigate whether and how network traffic of these protocols is interfered with by different Internet filtering systems. We created DNEye, a measurement system built on top of a network of distributed vantage points, which we used to study the accessibility of DoT/DoH and ESNI, and to investigate whether these protocols are tampered with by network providers (e.g., for censorship). We find evidence of blocking efforts against domain name encryption technologies in several countries, including China, Russia, and Saudi Arabia. On the bright side, we discover that domain name encryption can help with unblocking more than 55% and 95% of censored domains in China and other countries where DNS-based filtering is heavily employed.