Conference Paper

Why Is the Internet so Slow?!

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In principle, a network can transfer data at nearly the speed of light. Today’s Internet, however, is much slower: our measurements show that latencies are typically more than one, and often more than two orders of magnitude larger than the lower bound implied by the speed of light. Closing this gap would not only add value to today’s Internet applications, but might also open the door to exciting new applications. Thus, we propose a grand challenge for the networking research community: building a speed-of-light Internet. To help inform research towards this goal, we investigate, through large-scale measurements, the causes of latency inflation in the Internet across the network stack. Our analysis reveals an under-explored problem: the Internet’s infrastructural inefficiencies. We find that while protocol overheads, which have dominated the community’s attention, are indeed important, reducing latency inflation at the lowest layers will be critical for building a speed-of-light Internet. In fact, eliminating this infrastructural latency inflation, without any other changes in the protocol stack, could speed up small object fetches by more than a factor of three.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... While there are numerous measurement studies analyzing various aspects of real world network performance on the Internet, only a fraction of them focus on network latency. Most measurement studies of Internet latency rely on various forms of active measurements, most commonly using periodic pings from various probes or small file downloads [4,5,7,11,22,23,25,35,41,44]. ...
... In [5], Bozkurt et al. perform active measurements with curl to compare Internet latency with the theoretical minimum latency based on the speed of light, and find that due to inefficiencies in the Internet infrastructure the latency is one to two orders of magnitude higher than the limit imposed by the speed of light. Madanapalli et al. instead ping servers for several popular multiplayer games in [23], finding large differences in latency to the game servers from different ISPs based on their peering. ...
... Latency measurements performed during various forms of speedtests will capture latency under load, but the traffic load is artificial and to specific test servers, so it may not correspond to the latency users will experience for other Internet services. To better capture the latency from actual traffic, some studies have, like us, used or complemented their data with passive latency measurements [1,5,13,17,24,40,44]. ...
Preprint
Full-text available
While Internet Service Providers (ISPs) have traditionally focused on marketing network throughput, it is becoming increasingly recognized that network latency also plays a significant role for the quality of experience. However, many ISPs lack the means to continuously monitor network la-tency for their service. In this work, we present a method to continuously monitor and aggregate network latency per subnet directly in the Linux kernel by leveraging eBPF. We deploy this solution on a middlebox in an ISP network, and collect an extensive dataset of latency measurements for both the internal and external parts of the network. In our analysis, we find a wide latency tail in the last mile access, which varies over time of day, increasing during the evening. We also find large differences in latency across different external networks, where traffic to Asian and African regions show high and variable RTTs, while the RTTs for the most common autonomous systems have a low and narrow RTT distribution, indicating that the traffic is largely served by CDNs at nearby Internet exchange points.
... The best achievable latency between two points along the surface of the Earth is determined by their geodesic dis- tance divided by the speed of light, c. Latencies over the Internet, however, are usually much larger than this minimal "c-latency": recent measurement work found that fetching even small amounts of data over the Internet typically takes 37× longer than the c-latency, and often, more than 100× longer [12]. This delay comes from the many round-trips between the communicating endpoints, due to inefficien- cies in the transport and application layer protocols, and from each round-trip itself taking 3-4× longer than the c- latency [12]. ...
... Latencies over the Internet, however, are usually much larger than this minimal "c-latency": recent measurement work found that fetching even small amounts of data over the Internet typically takes 37× longer than the c-latency, and often, more than 100× longer [12]. This delay comes from the many round-trips between the communicating endpoints, due to inefficien- cies in the transport and application layer protocols, and from each round-trip itself taking 3-4× longer than the c- latency [12]. Given the approximately multiplicative role of network round-trip times (RTTs) (when bandwidth is not the main bottleneck), eliminating inflation in Internet RTTs can potentially translate to up to 3-4× speedup, even without any protocol changes. ...
... Upcoming application areas like virtual and augmented reality can only make this case stronger. We expect cISP's most valuable im- pact to be in breaking new ground on user interactivity over the Internet, as explored in some depth in prior work [12]. ...
Preprint
Full-text available
Low latency is a requirement for a variety of interactive network applications. The Internet, however, is not optimized for latency. We thus explore the design of cost-effective wide-area networks that move data over paths very close to great-circle paths, at speeds very close to the speed of light in vacuum. Our cISP design augments the Internet's fiber with free-space wireless connectivity. cISP addresses the fundamental challenge of simultaneously providing low latency and scalable bandwidth, while accounting for numerous practical factors ranging from transmission tower availability to packet queuing. We show that instantiations of cISP across the contiguous United States and Europe would achieve mean latencies within 5% of that achievable using great-circle paths at the speed of light, over medium and long distances. Further, we estimate that the economic value from such networks would substantially exceed their expense.
... The latency of DNS protocol directly impacts the performance of networking applications [9]. Therefore, many researchers measured the performance consequences of DoH deployment. ...
... There are at least eight DoH client implementations and at least six server implementations known and listed at dnscrypt.info. 9 The support of DoH by open resolvers was studied in 2019 by Deccio and Davis [18]. Their results show that the DoH adoption was very poor. ...
Article
Full-text available
The Internet Engineering Task Force adopted the DNS over HTTPS protocol in 2018 to remediate privacy issues regarding the plain text transmission of the DNS protocol. According to our observations and the analysis described in this paper, protecting DNS queries using HTTPS entails security threats. This paper surveys DoH related research works and analyzes malicious and unwanted activities that leverage DNS over HTTPS and can be currently observed in the wild. Additionally, we describe three real-world abuse scenarios observed in the web environment that reveal how service providers intentionally use DNS over HTTPS to violate policies. Last but not least, we identified several research challenges that we consider important for future security research.
... The figure illustrates that multiple DNS queries per page are the norm rather than the exception: about 50% of the sites require at least 20 DNS queries. DNS impacts networked application performance [6] and can reveal information about the destination of a connection [4]. Addressing increasing concerns about security, DNS-over-TLS (DoT) [11] and more recently DNS-over-HTTPS (DoH) [10] have been proposed within the IETF. ...
... It is an active research area, with works aiming at better understanding these redirection strategies [5,8,19]. Other works study DNS resolver behavior in the wild with respect to latency and traffic redirection [1], look at the impact of DNS on overall application delays in the Internet [6,25] or look at DNS infrastructure provisioning at the client side [23]. While all these works also target DNS, they have a stronger focus on the actual applications of DNS than the protocol itself. ...
Conference Paper
Full-text available
DNS is a vital component for almost every networked application. Originally it was designed as an unencrypted protocol, making user security a concern. DNS-over-HTTPS (DoH) is the latest proposal to make name resolution more secure. In this paper we study the current DNS-over-HTTPS ecosystem, especially the cost of the additional security. We start by surveying the current DoH landscape by assessing standard compliance and supported features of public DoH servers. We then compare different transports for secure DNS, to highlight the improvements DoH makes over its predecessor, DNS-over-TLS (DoT). These improvements explain in part the significantly larger take-up of DoH in comparison to DoT. Finally, we quantify the overhead incurred by the additional layers of the DoH transport and their impact on web page load times. We find that these overheads only have limited impact on page load times, suggesting that it is possible to obtain the improved security of DoH with only marginal performance impact.
... The figure illustrates that multiple DNS queries per page are the norm rather than the exception: about 50% of the sites require at least 20 DNS queries. DNS impacts networked application performance [6] and can reveal information about the destination of a connection [4]. Addressing increasing concerns about security, DNS-over-TLS (DoT) [11] and more recently DNS-over-HTTPS (DoH) [10] have been proposed within the IETF. ...
... It is an active research area, with works aiming at better understanding these redirection strategies [5,8,19]. Other works study DNS resolver behavior in the wild with respect to latency and traffic redirection [1], look at the impact of DNS on overall application delays in the Internet [6,25] or look at DNS infrastructure provisioning at the client side [23]. While all these works also target DNS, they have a stronger focus on the actual applications of DNS than the protocol itself. ...
Preprint
Full-text available
DNS is a vital component for almost every networked application. Originally it was designed as an unencrypted protocol, making user security a concern. DNS-over-HTTPS (DoH) is the latest proposal to make name resolution more secure. In this paper we study the current DNS-over-HTTPS ecosystem, especially the cost of the additional security. We start by surveying the current DoH landscape by assessing standard compliance and supported features of public DoH servers. We then compare different transports for secure DNS, to highlight the improvements DoH makes over its predecessor, DNS-over-TLS (DoT). These improvements explain in part the significantly larger take-up of DoH in comparison to DoT. Finally, we quantify the overhead incurred by the additional layers of the DoH transport and their impact on web page load times. We find that these overheads only have limited impact on page load times, suggesting that it is possible to obtain the improved security of DoH with only marginal performance impact.
... As network bandwidths have increased, latency has emerged as being the limiting factor for many networked systems, ranging from the extremes of high frequency trading, to the more mundane effects of latency on VoIP, online gaming, and web performance [2]. Fundamentally, once traffic engineering has mitigated congestion [7,9] and buffer bloat has been addressed, for wide-area traffic the remaining problem is that the speed of light in glass simply isn't fast enough. ...
... In this way, each sending groundstation can source-route traffic that will always find links up by the time the packet arrives at the relevant satellite. 2 How, then does the latency change as the network evolves? Figure 7 shows how the RTT from New York to London evolves over three minutes. Discontinuities are due to route changes within the satellite network, or a change of the satellite overhead the source or destination city. ...
Conference Paper
SpaceX has filed plans with the US Federal Communications Committee (FCC) to build a constellation of 4,425 low Earth orbit communication satellites. It will use phased array antennas for up and downlinks and laser communication between satellites to provide global low-latency high bandwidth coverage. To understand the latency propertes of such a network, we built a simulator based on public details from the FCC filings. We evaluate how to use the laser links to provide a network, and look at the problem of routing on this network. We provide a preliminary evaluation of how well such a network can provide low-latency communications, and examine its multipath properties. We conclude that a network built in this manner can provide lower latency communications than any possible terrestrial optical fiber network for communications over distances greater than about 3000 km.
... Category and country-specific lists are also being used: eight studies use country-specific lists from Alexa, usually choosing only one country; one study selected 138 countries [26]. Category-based lists are rarer still: two studies made use of category subsets [17,71]. ...
... Nine more papers study aspects of privacy & censorship, such as the Tor overlay network [61], or user tracking [35]. Network or application performance is also a popular area: ten papers in our survey focus on this, e.g., HTTP/2 server push [72], mobile web performance [71], and Internet latency [26]. ...
Preprint
Full-text available
A broad range of research areas including Internet measurement, privacy, and network security rely on lists of target domains to be analysed; researchers make use of target lists for reasons of necessity or efficiency. The popular Alexa list of one million domains is a widely used example. Despite their prevalence in research papers, the soundness of top lists has seldom been questioned by the community: little is known about the lists' creation, representativity, potential biases, stability, or overlap between lists. In this study we survey the extent, nature, and evolution of top lists used by research communities. We assess the structure and stability of these lists, and show that rank manipulation is possible for some lists. We also reproduce the results of several scientific studies to assess the impact of using a top list at all, which list specifically, and the date of list creation. We find that (i) top lists generally overestimate results compared to the general population by a significant margin, often even a magnitude, and (ii) some top lists have surprising change characteristics, causing high day-to-day fluctuation and leading to result instability. We conclude our paper with specific recommendations on the use of top lists, and how to interpret results based on top lists with caution.
... The quality of experience (QoE) the end user perceives when connecting to CSPs may be determined by the underlying connectivity agreements between MNOs and CSPs [7,8]. A recent crowd-sourcing measurement campaign [9] suggests that certain mobile domains perform poorly on many MNOs. ...
... CSPs' performance in mobile networks may be affected by multiple factors such as radio link variability, the presence of in-path middleboxes [15], traffic shaping policies [16,17], the behavior of the DNS resolver [18,19], the peering relationships between cloud providers and MNOs [7,8], and inflated network paths [20]. While previous research studies assumed that users are always paired with geographically close content replicas thanks to DNS-based geolocation techniques [21] and IP anycast, [22] showed that this assumption might not always be true due to inaccurate geolocation of mobile users resulting in suboptimal server assignment. ...
Conference Paper
Full-text available
Mobile applications outsource their cloud infrastructure deployment and content delivery to cloud computing services and content delivery networks. Studying how these services, which we collectively denote Cloud Service Providers (CSPs), perform over Mobile Network Operators (MNOs) is crucial to understanding some of the performance limitations of today's mobile apps. To that end, we perform the first empirical study of the complex dynamics between applications, MNOs and CSPs. First, we use real mobile app traffic traces that we gathered through a global crowdsourcing campaign to identify the most prevalent CSPs supporting today's mobile Internet. Then, we investigate how well these services interconnect with major European MNOs at a topological level, and measure their performance over European MNO networks through a month-long measurement campaign on the MONROE mobile broadband testbed. We discover that the top 6 most prevalent CSPs are used by 85% of apps, and observe significant differences in their performance across different MNOs due to the nature of their services, peering relationships with MNOs, and deployment strategies. We also find that CSP performance in MNOs is affected by inflated path length, roaming, and presence of middleboxes, but not influenced by the choice of DNS resolver.
... Not all available relays are useful for a certain pair of endpoints. Some of them, even if used under ideal conditions within a "speedof-light" Internet [14], still yield larger latency than the observed direct path. Thus, to exclude such relays, we follow a simple approach based on the geolocation information of the involved nodes. ...
... COR and RAR_other yield improvements >100ms (which are critical for e.g., application service providers [37]) in 6% of the improved cases (5% of total). These gains stem solely from the discovery of fast TIV-enabled paths, and do not consider other sources of latency that cut through the network stack [14]. Note that RAR_eye and PLR have very similar (low) performance, while RAR_eye and RAR_other differ significantly; the latter supports our intuition of differentiating between the two RAR types. ...
Conference Paper
Full-text available
Network overlays, running on top of the existing Internet substrate, are of perennial value to Internet end-users in the context of, e.g., real-time applications. Such overlays can employ traffic relays to yield path latencies lower than the direct paths, a phenomenon known as Triangle Inequality Violation (TIV). Past studies identify the opportunities of reducing latency using TIVs. However, they do not investigate the gains of strategically selecting relays in Colocation Facilities (Colos). In this work, we answer the following questions: (i) how Colo-hosted relays compare with other relays as well as with the direct Internet, in terms of latency (RTT) reductions; (ii) what are the best locations for placing the relays to yield these reductions. To this end, we conduct a large-scale one-month measurement of inter-domain paths between RIPE Atlas (RA) nodes as endpoints, located at eyeball networks. We employ as relays Planetlab nodes, other RA nodes, and machines in Colos. We examine the RTTs of the overlay paths obtained via the selected relays, as well as the direct paths. We find that Colo-based relays perform the best and can achieve latency reductions against direct paths, ranging from a few to 100s of milliseconds, in 76% of the total cases; 75% (58% of total cases) of these reductions require only 10 relays in 6 large Colos.
... If so, the abandoned overdue responses will degrade the user experience and waste the computing and communication resources. According to the statistics in [3], a 100 ms latency increase generates a 1% income loss at Amazon, while a 400 ms delay increase in search responses can reduce search volume by 0.7% for Google. ...
Article
Full-text available
Many emerging Internet of Things (IoT) applications deployed on cloud platforms have strict latency requirements or deadline constraints, and thus meeting the deadlines is crucial to ensure the quality of service for users and the revenue for service providers in these delay-stringent IoT applications. Efficient flow scheduling in data center networks (DCNs) plays a major role in reducing the execution time of jobs and has garnered significant attention in recent years. However, only few studies have attempted to combine job-level flow scheduling and routing to guarantee meeting the deadlines of multi-stage jobs. In this paper, an efficient heuristic joint flow scheduling and routing (JFSR) scheme is proposed. First, targeting maximizing the number of jobs for which the deadlines have been met, we formulate the joint flow scheduling and routing optimization problem for multiple multi-stage jobs. Second, due to its mathematical intractability, this problem is decomposed into two sub-problems: inter-coflow scheduling and intra-coflow scheduling. In the first sub-problem, coflows from different jobs are scheduled according to their relative remaining times; in the second sub-problem, an iterative coflow scheduling and routing (ICSR) algorithm is designed to alternately optimize the routing path and bandwidth allocation for each scheduled coflow. Finally, simulation results demonstrate that the proposed JFSR scheme can significantly increase the number of jobs for which the deadlines have been met in DCNs.
... Nearly all Internet communication starts with a DNS lookup, and complex websites which require content from multiple third parties might perform hundreds of DNS requests before loading a single page [Butkiewicz et al. 2011]. Thus, DNS performance is of concern as it directly impacts performance in most Internet-based communications [Bozkurt et al. 2017]. ...
Conference Paper
Full-text available
The performance of Domain Name System (DNS) resolvers is crucial, as the majority of the communication in the Internet starts with a DNS lookup to resolve a domain an IP address to reach the desired content. In this sense, the academia has been devoted to measure and analyze the performance of DNS resolvers using different tools. However, such tools might present different results due to their implementation and affect the measurements. Hence, this paper provides an analysis and comparison of there different DNS lookup tools employed in the literature and discuss the impact of the tool selection.
... [2] discusses DNS request resolution and how it relates to optimal response times for web browsin. Subsequently, [3] conducted an investigation to determine the causes of Internet latency, with DNS resolution often being a factor in latency. Of the 1.9 million connections analyzed, DNS resolution was one of the most relevant factors, and the handshake TLS factor was another, averaging 6.3 times and 10.2 times the optical transport latency, respectively. ...
Preprint
Full-text available
Mobile Internet is an integral part of daily life and the development of the world. The delay of mobile Internet directly affects the income of enterprises that provide Internet services and our living standards. According to research, DNS is one of the two most important factors affecting Internet latency. DNS relies on extensive caching for good performance. Additionally, each DNS zone provides caching to improve DNS response speed, but most caching is recursive. In terms of authoritative caching, except for the root cache, the caches of other tlds are less concerned. This paper analyzes the impact of domestically deployed .com and .net caches on resolution time, hoping to find out the impact of tld cache on DNS resolution speed. To this end, we have an online deployment at China Mobile Communications Group Henan Co., Ltd. A large number of experimental results show that the tld cache can greatly improve the DNS resolution speed and reduce the Internet delay.
... The onboard core networks will benefit from the low-latency and wide-coverage links of LEO satellites. The terrestrial fiber paths are generally long-winded, in which the light travels at roughly 2c/3 [22]. While most of LEO satellites are orbiting at 500 km to 1,000 km above the Earth's surface. ...
Preprint
Recent developments in the aerospace industry have led to a dramatic reduction in the manufacturing and launch costs of low Earth orbit satellites. The new trend enables the paradigm shift of satellite-terrestrial integrated networks with global coverage. In particular, the integration of 5G communication systems and satellites has the potential to restructure next-generation mobile networks. By leveraging the network function virtualization and network slicing, the orbital 5G core networks will facilitate the coordination and management of network functions in satellite-terrestrial integrated networks. We are the first to deploy a lightweight 5G core network on a real-world satellite to investigate its feasibility. We conducted experiments to validate the onboard 5G core network functions. The validated procedures include registration and session setup procedures. The results show that the 5G core network can function normally and generate correct signaling.
... At present, the intermediate data's transmission time accounts for 33-50% of the time required to complete cloud computing tasks [3]. As the time delay is a critical performance indicator for cloud computing services, the blocking of DCNs will directly impact server processing efficiency and, consequently, operators' profits [4]. ...
Article
Full-text available
With the explosive growth of cloud computing applications, the east-west traffic among servers has come to occupy the dominant proportion of the traffic in data center networks (DCNs). Cloud computing tasks need to be executed in a distributed manner on multiple servers, which exchange large amounts of intermediate data between the adjacent stages of each multi-stage task. Therefore, the congestion in DCNs can reduce the processing performance when conducting multi-stage tasks. To address this, the relationship between the blocking performance and the traffic load can be adopted as a theoretical basis for network planning and traffic engineering. In this paper, the traffic load correlation between edge links and aggregation links is considered, and an iterative blocking performance analysis method is proposed for two-layer tree-type DCNs. The simulation results show the good accuracy of the proposed method with respect to the theoretical results especially in the blocking rate range below 4% and with over-subscription ratio 1.5.
... For example, 40% viewers in YouTube will quit watching the videos if there are rebufferings caused by the delay [29]. The search volume of Google will decrease by 0.74% if the response time increases by 400ms [30]. The network delay is highly related to the locations of VNFs and the path scheduling schemes in the requirement of the service chain. ...
Article
Full-text available
With the rapid development of network function virtualization, delay-sensitive applications including auto-driving, online gaming, and multimedia conferencing can be served by virtual network function (VNF) chains with low operation expense/capital expense and high flexibility. However, as the service requests are highly dynamic and different services require distinct bandwidth occupation amount and time, how to schedule the paths of flows and place VNFs efficiently to guarantee the performances of network applications and maximize the utilization of the underlying network is a challenging problem. In this paper, we present a joint optimization approach of flow path scheduling and VNF placement, named JOSP, which explores the best utilization of bandwidth from two different aspects to reduce the network delay. We first present a delay scaling strategy that adds the penalty to the link bandwidth occupation that may cause congestion in accordance with the network placement locations. Then we consider the bandwidth occupation time and present a long-short flow differentiating strategy for the data flows with different duration. Furthermore, we present a reinforcement learning framework and use both the flow path delay and the network function-related delay to calculate the reward of placing VNFs adaptively. Performance evaluation results show that the JOSP can reduce the network delay by 40% on average compared with the existing methods.
... inflation in min. ping in [9,11]). We assume that any city pair is connected by an optical fiber over the shortest distance between the pair, i.e., geodesic [50]. ...
Preprint
We study efficiency in a proof-of-work blockchain with non-zero latencies, focusing in particular on the (inequality in) individual miners' efficiencies. Prior work attributed differences in miners' efficiencies mostly to attacks, but we pursue a different question: Can inequality in miners' efficiencies be explained by delays, even when all miners are honest? Traditionally, such efficiency-related questions were tackled only at the level of the overall system, and in a peer-to-peer (P2P) setting where miners directly connect to one another. Despite it being common today for miners to pool compute capacities in a mining pool managed by a centralized coordinator, efficiency in such a coordinated setting has barely been studied. In this paper, we propose a simple model of a proof-of-work blockchain with latencies for both the P2P and the coordinated settings. We derive a closed-form expression for the efficiency in the coordinated setting with an arbitrary number of miners and arbitrary latencies, both for the overall system and for each individual miner. We leverage this result to show that inequalities arise from variability in the delays, but that if all miners are equidistant from the coordinator, they have equal efficiency irrespective of their compute capacities. We then prove that, under a natural consistency condition, the overall system efficiency in the P2P setting is higher than that in the coordinated setting. Finally, we perform a simulation-based study to demonstrate that even in the P2P setting delays between miners introduce inequalities, and that there is a more complex interplay between delays and compute capacities.
... This approach is not novel and it has been followed in other studies to characterize latency variations on a large scale. In [40], for each hostpair the difference between the maximum and minimum RTT observed in a time bin was calculated. Then, the evolution of the obtained difference values was used to investigate on transient congestion. ...
Article
Full-text available
The COVID-19 pandemic dramatically changed the way of living of billions of people in a very short time frame. In this paper, we evaluate the impact on the Internet latency caused by the increased amount of human activities that are carried out on-line. The study focuses on Italy, which experienced significant restrictions imposed by local authorities, but results about Spain, France, Germany, Sweden, and the whole of Europe are also included. The analysis of a large set of measurements shows that the impact on the network can be significant, especially in terms of increased variability of latency. In Italy we observed that the standard deviation of the average additional delay – the additional time with respect to the minimum delay of the paths in the region – during lockdown is ∼3−4 times as much as the value before the pandemic. Similarly, in Italy, packet loss is ∼2−3 times as much as before the pandemic. The impact is not negligible also for the other countries and for the whole of Europe, but with different levels and distinct patterns.
... This approach is not novel and it has been followed in other studies to characterize latency variations on a large scale. In [40], for each hostpair the difference between the maximum and minimum RTT observed in a time bin was calculated. Then, the evolution of the obtained difference values was used to investigate on transient congestion. ...
Preprint
Full-text available
The COVID-19 pandemic dramatically changed the way of living of billions of people in a very short time frame.In this paper, we evaluate the impact on the Internet latency caused by the increased amount of human activitiesthat are carried out on-line. The study focuses on Italy, which experienced significant restrictions imposed by localauthorities, but results about Spain, France, Germany, Sweden, and the whole of Europe are also included. Theanalysis of a large set of measurements shows that the impact on the network can be significant, especially in termsof increased variability of latency. In Italy we observed that the standard deviation of the average additional delay –the additional time with respect to the minimum delay of the paths in the region – during lockdown is∼3−4 timesas much as the value before the pandemic. Similarly, in Italy, packet loss is∼2−3 times as much as before thepandemic. The impact is not negligible also for the other countries and for the whole of Europe, but with differentlevels and distinct patterns.
... The fact that latency impacts performance is well known for wide area networks (WAN) [29,43,41,32,11], for example as it is implicitly a part of rate computations for TCP, but also for host applications. The latency studied in these works was in the order of milliseconds to hundreds of milliseconds. ...
Preprint
Full-text available
Businesses and individuals run increasing numbers of applications in the cloud. The performance of an application running in the cloud depends on the data center conditions and upon the resources committed to an application. Small network delays may lead to a significant performance degradation, which affects both the user's cost and the service provider's resource usage, power consumption and data center efficiency. In this work, we quantify the effect of network latency on several typical cloud workloads, varying in complexity and use cases. Our results show that different applications are affected by network latency to differing amounts. These insights into the effect of network latency on different applications have ramifications for workload placement and physical host sharing when trying to reach performance targets.
... It is very inconvenient and causes great losses to the company's revenue. For example, if the delay in the network is increased by 100ms, the revenue of Amazon's sales will drop by 1%; if the delay is increased by 500ms, Microsoft's bing search revenue will be Decreased by 1.2% [30]. On the other hand, when data of long and short flows are transmitted on the same path, the long flows delays the completion time of short flows due to the preemption of the short flows resources by the huge amount of long flows. ...
Article
Full-text available
Multipath TCP (MPTCP) benefits the signals transmission during the progress of array signal processing, since it can provide higher aggregated bandwidth and reliable link connection with the existence of backup paths. However, the small amount files (short flows) appear poor performance with MPTCP, especially when they compete with some larger files (long flows). To alleviate this issue, this paper proposes a MPTCP Transmission Optimization Algorithm for Short Flows, namely MPTCP-TOASF (A MPTCP Transmission Optimization Algorithm for Short Flows). MPTCP-TOASF designs an optimal path group to transmit short flows based on round-trip time (RTT) and switches to traditional MPTCP for transmission of long flows. Moreover, to address the issue that long flows occupy the most of buffer space, MPTCP-TOASF utilizes the idea of the delay-sensitive Veno congestion control over paths in optimal path group, so as to avoid the resource preemption of long flows and reduce the transport delay of short flows. Finally, we conduct extensive experiments based on NS-3 to validate the performance of proposed algorithm in various scenarios, such as only short flow transmitting, long and short flows coexisting, and background traffic existing. Moreover, we measure the completion time of short flows and goodput of long flows with different number of subflows, different concurrent number of short flows, different value of thresholds etc. The experimental results show that the MPTCP-TOASF algorithm can effectively reduce the short-flow completion time while maintain long-flow throughput when long and short flows occur concurrently, thereby improving the mean goodput of network and achieving a great performance enhancement.
... Apart from the distance, another crucial factor for the network performance is traffic congestion. Due to the traffic congestion, the network forwarding devices such as switches and routers cannot cope with heavy traffic via the Internet, increasing the overall network delay [ 40 ]. Additionally, link failures and re-transmission process also result in higher network delay communication. ...
Article
Full-text available
Networked Music Performance (NMP) systems involve musicians located in different places who perform music while staying synchronized via the Internet. The maximum end‐to‐end delay in NMP applications is called Ensemble Performance Threshold (EPT) and should be less than 25 milliseconds. Due to this constraint, NMPs require ultra–low‐delay solutions for audio coding, network transmission, relaying, and decoding, each one a challenging task on its own. There are two directions for study in the related work referring to the NMP systems. From the audio perspective, researchers experiment on low‐delay encoders and transmission patterns, aiming to reduce the processing delay of the audio transmission, but they ignore the network performance. On the other hand, network‐oriented researchers try to reduce the network delay, which contributes to reduced end‐to‐end delay. In our proposed approach, we introduce an integration of dynamic audio and network configuration to satisfy the EPT constraint. The basic idea is that, the major components participating in an NMP system, the application and the network interact during the live music performance. As the network delay increases, the network tries to equalize it by modifying the routing behavior using Software‐Defined Networking principles. If the network delay exceeds a maximum affordable threshold, the network reacts by informing the application to change the audio processing pattern to overcome the delay increase, resulting in below EPT end‐to‐end delay. A full prototype of the proposed system was implemented and extensively evaluated in an emulated environment.
... Interactive applications, from gaming to voice and videoconferencing, offer the best quality of experience when latency is low. Providers need means to deliver not only capacity to end users, but also low latency [6,28], and have an economic incentive to do so [23]. One seemingly promising strategy for cutting latency is to build more mesh-like backbones: to introduce links that carry demand along a more direct geographic path, shortcutting a more circuitous one. ...
Conference Paper
Full-text available
Early in in the Internet's history, routing within a single provider's WAN centered on placing traffic on the shortest path. More recent traffic engineering efforts aim to reduce congestion and/or increase utilization within the status quo of greedy shortest-path first routing on a sparse topology. In this paper, we argue that this status quo of routing and topology is fundamentally at odds with placing traffic so as to minimize latency for users while avoiding congestion. We advocate instead provider backbone topologies that are more mesh-like, and hence better at providing multiple low-latency paths, and a routing system that directly considers latency minimization and congestion avoidance while dynamically placing traffic on multiple unequal-cost paths. We offer a research agenda for achieving this new low-latency approach to WAN topology design and routing.
Article
Large-scale video conferencing services incur significant network cost while serving surging global demands. Our work systematically explores the opportunity to offload a fraction of this traffic to the Internet, a cheaper routing option offered already by cloud providers, from WAN without drop in application performance. First, with a large-scale latency measurement study with 3.5 million data points per day spanning 241K source cities and 21 data centers across the globe, we demonstrate that Internet paths perform comparable to or better than the private WAN for parts of the world (e.g., Europe and North America). Next, we present Titan, a live (12+ months) production system that carefully moves a fraction of the conferencing traffic to the Internet using the above observation. Finally, we propose Titan-Next - a research prototype that jointly assigns the conferencing server and routing option (Internet or WAN) for individual calls. With 5 weeks of production data, we show Titan-Next reduces the sum of peak bandwidth on WAN links that defines the operational network cost by up to 61% compared to state-of-the-art baselines.
Article
Full-text available
Система доменних імен (DNS) займається перетворенням IP-адреси сервера в доменне ім’я, що дає можливість кінцевому користувачу отримувати доступ до ресурсу, не запам’ятовуючи його IP-адреси. Даний протокол є основою сучасного інтернету, проте всі повідомл
Article
Internet content providers (ICPs) typically exploit content distribution networks (CDNs) to provide wide-area data access with high availability and low latency. However, our analysis on a large-scale trace collected from seven major CDN operators has revealed that: from a global perspective, there are still a large portion of users suffering from high user-perceived latency due to the insufficient deployment of terrestrial cloud infrastructures, especially in remote or rural areas where even the closest available cache server is too far away. This paper presents, a cost-effective content distribution framework to optimize global CDNs and enable low content access latency anywhere. collaboratively builds CDNs upon emerging low earth orbit (LEO) constellations and existing cloud platforms to satisfy the low latency requirements while minimizing the operational cost. Specifically, exploits a key insight that emerging mega-constellations will consist of thousands of LEO satellites which can be equipped with high-speed data links and storage, and thus can potentially work as “cache in space” to enable pervasive and low-latency data access. judiciously places replicas on either LEO satellite caches or terrestrial cloud caches, and dynamically assigns user requests to proper cache servers based on different constellation parameters, cloud/user distributions and pricing policies. We have implemented a prototype in our testbed, and extensive trace-driven evaluations covering multiple geo-distributed vantage points have demonstrated that can effectively reduce the global content access latency with acceptable operational cost under representative CDN traffic.
Article
Validating the network paths taken by packets is critical in constructing a secure Internet architecture. Any feasible solution must both enforce packet forwarding along end-host specified paths and verify whether packets have taken those paths. However, the current Internet supports neither enforcement nor verification. Likely due to the radical changes to the Internet architecture and a long-standing confusion between routing and forwarding, only limited solutions for path validation exist in the literature. This survey article aims to reinvigorate research on the essential topic of path validation by crystallizing not only how path validation works but also where seemingly qualified solutions fall short. The analyses explore future research directions in path validation aimed at improving security, privacy, and efficiency.
Chapter
Header bidding (HB) is a relatively new online advertising technology that allows a content publisher to conduct a client-side (i.e., from within the end-user’s browser), real-time auction for selling ad slots on a web page. We developed a new browser extension for Chrome and Firefox to observe this in-browser auction process from the user’s perspective. We use real end-user measurements from 393,400 HB auctions to (a) quantify the ad revenue from HB auctions, (b) estimate latency overheads when integrating with ad exchanges and discuss their implications for ad revenue, and (c) break down the time spent in soliciting bids from ad exchanges into various factors and highlight areas for improvement. For the users in our study, we find that HB increases ad revenue for web sites by 28%28\% compared to that in real-time bidding as reported in a prior work. We also find that the latency overheads in HB can be easily reduced or eliminated and outline a few solutions, and pitch the HB platform as an opportunity for privacy-preserving advertising.
Conference Paper
Physical infrastructures that facilitate e.g., delivery of power, water and communication capabilities are of intrinsic importance in our daily lives. Accurate maps of physical infrastructures are important for permitting, maintenance, repair and growth but can be considered a commercial and/or security risk. In this paper, we describe a method for obfuscating physical infrastructure maps that removes sensitive details while preserving key features that are important in commercial and research applications. We employ a three-tiered approach: tier 1 does simple location fuzzing, tier 2 maintains connectivity details but randomizes node/link locations, while at tier 3 only distributional properties of a network are preserved. We implement our tiered approach in a tool called Bokeh which operates on GIS shapefiles that include detailed location information of infrastructure and produces obfuscated maps. We describe a case study that applies Bokeh to a number of Internet Service Provider maps. The case study highlights how each tier removes increasing amounts of detail from maps. We discuss how Bokeh can be generally applied to other physical infrastructures or in local services that are increasingly used for e-marketing.
Article
TCP congestion control is a vital component for the latency of Web services. In practice, a single congestion control mechanism is often used to handle all TCP connections on a Web server, e.g., Cubic for Linux by default. Considering complex and ever-changing networking environment, the default congestion control may not always be the most suitable one. Adjusting congestion control to meet different networking scenarios usually requires modification of TCP stacks on a server. This is difficult, if not impossible, due to various operating system and application configurations on production servers. In this paper, we propose Mystique , a light-weight, flexible, and dynamic congestion control switching scheme that allows network or server administrators to deploy any congestion control schemes transparently without modifying existing TCP stacks on servers. We have implemented Mystique in Open vSwitch (OVS) and conducted extensive test-bed experiments in both public and private cloud environments. Experiment results have demonstrated that Mystique is able to effectively adapt to varying network conditions, and can always employ the most suitable congestion control for each TCP connection. More specifically, Mystique can significantly reduce latency by 18.13% on average when compared with individual congestion controls.
Conference Paper
Full-text available
For many Internet services, reducing latency improves the user experience and increases revenue for the service provider. While in principle latencies could nearly match the speed of light, we find that infrastructural inefficiencies and protocol overheads cause today's Internet to be much slower than this bound: typically by more than one, and often, by more than two orders of magnitude. Bridging this large gap would not only add value to today's Internet applications, but could also open the door to exciting new applications. Thus, we propose a grand challenge for the networking research community: a speed-of-light Internet. To inform this research agenda, we investigate the causes of latency inflation in the Internet across the network stack. We also discuss a few broad avenues for latency improvement.
Conference Paper
Full-text available
A route in the Internet may take a longer AS (autonomous systems) path than the shortest AS path due to routing policies. We systematically analyze AS paths and quantify the extent to which routing policies inflate AS paths. The results show that AS path inflation in the Internet is more prevalent than expected. We first present the extent of AS path inflation observed from the route view routing tables. From an ISP, at least 55% of AS paths are inflated by at least one AS hop and AS paths can be inflated by as long as 6 AS hops. We then employ two typical routing policies to show the extent of AS path inflation for all AS pairs; we find that at least 45% of AS paths are inflated by at least one AS hop and AS paths can be inflated by as many as 9 AS hops. Quantifying AS path inflation in the Internet has important implications on the extent of routing policies and traffic engineering performed on the Internet, and on BGP (border gateway protocol) convergence speed.
Article
Full-text available
The Border Gateway Protocol (BGP) plays a crucial role in the delivery of traffic in the Internet. Fluctuations in BGP routes cause degradation in user performance, increased processing load on routers, and changes in the distribution of traffic load over the network. Although earlier studies have raised concern that BGP routes change quite often, previous work has not considered whether these routing fluctuations affect a significant portion of the traffic. This paper shows that the small number of popular destinations responsible for the bulk of Internet traffic have remarkably stable BGP routes. The vast majority of BGP instability stems from a small number of unpopular destinations. We draw these conclusions from a joint analysis of BGP update messages and flow-level traffic measurements from AT&T's IP backbone. In addition, we analyze the routing stability of destination prefixes corresponding to the NetRating's list of popular Web sites using the update messages collected by the RouteViews and RIPE-NCC servers. Our results suggest that operators can engineer their networks under the assumption that the BGP advertisements associated with most of the traffic are reasonably stable.
Conference Paper
For interactive networked applications like web browsing, every round-trip time (RTT) matters. We introduce ASAP, a new naming and transport protocol that reduces latency by shortcutting DNS requests and eliminating TCP's three-way handshake, while ensuring the key security property of verifiable provenance of client requests. ASAP eliminates between one and two RTTs, cutting the delay of small requests by up to two-thirds.
Article
For interactive networked applications like web browsing, every round-trip time (RTT) matters. We introduce ASAP, a new naming and transport protocol that reduces latency by shortcutting DNS requests and eliminating TCP's three-way handshake, while ensuring the key security property of verifiable provenance of client requests. ASAP eliminates between one and two RTTs, cutting the delay of small requests by up to two-thirds.
Conference Paper
Public measurement platforms composed of low-end hardware devices such as RIPE Atlas have gained significant traction in the research community. Such platforms are indeed particularly interesting as they provide Internet-wide measurement capabilities together with an ever growing set of measurement tools. To be scalable though, they allow for concurrent measurements between users. This paper answers a fundamental question for any platform user: Do measurements launched by others impact my results? If so, what can I do about it? We measured the impact of multiple users running experiments in parallel on the RIPE Atlas platform. We found that overlapping measurements do interfere with each other in at least two ways. First, we show that measurements performed from and towards the platform can significantly increase timings reported by the probe. We found that increasing hardware CPU greatly helped in limiting interference on the measured timings. Second, we show that measurement campaigns can end up completely out-of-synch (by up to one hour), due to concurrent loads. In contrast to precision, we found that better hardware does not help.
Article
Meter by meter, a slim vein of fiber-optic cable will soon start snaking its way across the bottom of three oceans and bring the world a few milliseconds closer together. The line will start near Tokyo and cut diagonally across the Pacific, hugging the northern shore of North America and slicing down across the Atlantic to stop just shy of London. Once the cable is live, light will transmit data from one end to the other in just 154 milliseconds??? 24 ms less than today???s speediest digital connection between Japan and the United Kingdom.
Article
We measure Web performance bottlenecks in home broadband access networks and evaluate ways to mitigate these bottlenecks with caching within home networks. We first measure Web performance bottlenecks to nine popular Web sites from more than 5,000 broadband access networks and demonstrate that when the downstream throughput of the access link exceeds about 16 Mbits/s, latency is the main bottleneck for Web page load time. Next, we use a router-based Web measurement tool, Mirage, to deconstruct Web page load time into its constituent components (DNS lookup, TCP connection setup, object download) and show that simple latency optimizations can yield significant improvements in overall page load times. We then present a case for placing a cache in the home network and deploy three common optimizations: DNS caching, TCP connection caching, and content caching. We show that caching only DNS and TCP connections yields significant improvements in page load time, even when the user's browser is already performing similar independent optimizations. Finally, we use traces from real homes to demonstrate how prefetching DNS and TCP connections for popular sites in a home-router cache can achieve faster page load times.
Conference Paper
Web page load time is a key performance metric that many techniques aim to reduce. Unfortunately, the complexity of modern Web pages makes it difficult to identify performance bottlenecks. We present WProf, a lightweight in-browser profiler that produces a detailed dependency graph of the activities that make up a page load. WProf is based on a model we developed to capture the constraints between network load, page parsing, JavaScript/CSS evaluation, and rendering activity in popular browsers. We combine WProf reports with critical path analysis to study the page load time of 350 Web pages under a variety of settings including the use of end-host caching, SPDY instead of HTTP, and the mod pagespeed server extension. We find that computation is a significant factor that makes up as much as 35% of the critical path, and that synchronous JavaScript plays a significant role in page load time by blocking HTML parsing. Caching reduces page load time, but the reduction is not proportional to the number of cached objects, because most object loads are not on the critical path. SPDY reduces page load time only for networks with high RTTs and mod pagespeed helps little on an average page.
Conference Paper
We present the first large-scale analysis of Web performance bottlenecks as measured from broadband access networks, using data collected from extensive home router deployments. We analyze the limits of throughput on improving Web performance and identify the contribution of critical factors such as DNS lookups and TCP connection establishment to Web page load times. We find that, as broadband speeds continue to increase, other factors such as TCP connection setup time, server response time, and network latency are often dominant performance bottlenecks. Thus, realizing a "faster Web" requires not only higher download throughput, but also optimizations to reduce both client and server-side latency.
Article
Today's web services are dominated by TCP flows so short that they terminate a few round trips after handshaking; this handshake is a significant source of latency for such flows. In this paper we describe the design, implementation, and deployment of the TCP Fast Open protocol, a new mechanism that enables data exchange during TCP's initial handshake. In doing so, TCP Fast Open decreases application network latency by one full round-trip time, decreasing the delay experienced by such short TCP transfers. We address the security issues inherent in allowing data exchange during the three-way handshake, which we mitigate using a security token that verifies IP address ownership. We detail other fall-back defense mechanisms and address issues we faced with middleboxes, backwards compatibility for existing network stacks, and incremental deployment. Based on traffic analysis and network emulation, we show that TCP Fast Open would decrease HTTP transaction network latency by 15% and whole-page load time over 10% on average, and in some cases up to 40%.
Article
Low latency is critical for interactive networked applications. But while we know how to scale systems to increase capacity, reducing latency --- especially the tail of the latency distribution --- can be much more difficult. In this paper, we argue that the use of redundancy is an effective way to convert extra capacity into reduced latency. By initiating redundant operations across diverse resources and using the first result which completes, redundancy improves a system's latency even under exceptional conditions. We study the tradeoff with added system utilization, characterizing the situations in which replicating all tasks reduces mean latency. We then demonstrate empirically that replicating all operations can result in significant mean and tail latency reduction in real-world systems including DNS queries, database servers, and packet forwarding within networks.
Article
Years after the initial development of the current routing protocols we still lack an understanding of the impact of various parameters on the routes chosen in today’s Internet. Network operators are struggling to optimize their routing, but the effectiveness of those efforts is limited.In this article, we study sensitivity of routing stretch and diversity metrics to factors such as policies, topology, IGP weights, etc. using statistical techniques. We confirm previous findings that routing policies and AS size (in number of routers) are the dominating factors. Surprisingly, we find that intra-domain factors only have marginal impact on global path properties.Moreover, we study path inflation by comparing against the paths that are shortest in terms of AS-level/router-level hops or geographic distances. Overall, the majority of routes incur reasonable stretch. From the experience with our Internet-scale simulations, we find it hard to globally optimize path selection with respect to the geographic length of the routes, as long as inter-domain routing protocols do not include an explicit notion of geographic distance in the routing information.
Conference Paper
For interactive networked applications like web browsing, every round-trip time (RTT) matters. We introduce ASAP, a new naming and transport protocol that reduces latency by shortcutting DNS requests and eliminating TCP's three-way handshake, while ensuring the key security property of verifiable provenance of client requests. ASAP eliminates between one and two RTTs, cutting the delay of small requests by up to two-thirds.
Article
TCP flows start with an initial congestion window of at most four segments or approximately 4KB of data. Because most Web transactions are short-lived, the initial congestion window is a critical TCP parameter in determining how quickly flows can finish. While the global network access speeds increased dramatically on average in the past decade, the standard value of TCP's initial congestion window has remained unchanged. In this paper, we propose to increase TCP's initial congestion window to at least ten segments (about 15KB). Through large-scale Internet experiments, we quantify the latency benefits and costs of using a larger window, as functions of network bandwidth, round-trip time (RTT), bandwidth-delay product (BDP), and nature of applications. We show that the average latency of HTTP responses improved by approximately 10% with the largest benefits being demonstrated in high RTT and BDP networks. The latency of low bandwidth networks also improved by a significant amount in our experiments. The average retransmission rate increased by a modest 0.5%, with most of the increase coming from applications that effectively circumvent TCP's slow start algorithm by using multiple concurrent connections. Based on the results from our experiments, we believe the initial congestion window should be at least ten segments and the same be investigated for standardization by the IETF.
Amazon Found Every 100ms of Latency Cost Them 1% in Sales
  • J Liddle
Fixing the Internet for real time applications: Part II
  • P Maynard-Koran
On reducing latencies below the perceptible
  • D Täht
Latency: the new web performance bottleneck
  • I Grigorik
Speeding up mobile browsers without infrastructure support. Master’s thesis
  • Z Wang
Speed matters for Google Web search
  • J Brutlag
Intertubes: a study of the US long-haul fiber-optic infrastructure
  • R Durairajan
  • P Barford
  • J Sommers
  • W Willinger
Google): Performance related changes and their user impact
  • E Schurman
  • Bing
  • J Brutlag