-
[show abstract]
[hide abstract]
ABSTRACT: Today's data centers face extreme challenges in providing low latency.
However, fair sharing, a principle commonly adopted in current congestion
control protocols, is far from optimal for satisfying latency requirements.
We propose Preemptive Distributed Quick (PDQ) flow scheduling, a protocol
designed to complete flows quickly and meet flow deadlines. PDQ enables flow
preemption to approximate a range of scheduling disciplines. For example, PDQ
can emulate a shortest job first algorithm to give priority to the short flows
by pausing the contending flows. PDQ borrows ideas from centralized scheduling
disciplines and implements them in a fully distributed manner, making it
scalable to today's data centers. Further, we develop a multipath version of
PDQ to exploit path diversity.
Through extensive packet-level and flow-level simulation, we demonstrate that
PDQ significantly outperforms TCP, RCP and D3 in data center environments. We
further show that PDQ is stable, resilient to packet loss, and preserves nearly
all its performance gains even given inaccurate flow information.
06/2012;
-
[show abstract]
[hide abstract]
ABSTRACT: We consider the problem of answering point-to-point shortest path queries on
massive social networks. The goal is to answer queries within tens of
milliseconds while minimizing the memory requirements. We present a technique
that achieves this goal for an extremely large fraction of path queries by
exploiting the structure of the social networks.
Using evaluations on real-world datasets, we argue that our technique offers
a unique trade-off between latency, memory and accuracy. For instance, for the
LiveJournal social network (roughly 5 million nodes and 69 million edges), our
technique can answer 99.9% of the queries in less than a millisecond. In
comparison to storing all pair shortest paths, our technique requires at least
550x less memory; the average query time is roughly 365 microseconds --- 430x
faster than the state-of-the-art shortest path algorithm. Furthermore, the
relative performance of our technique improves with the size (and density) of
the network. For the Orkut social network (3 million nodes and 220 million
edges), for instance, our technique is roughly 2588x faster than the
state-of-the-art algorithm for computing shortest paths.
06/2012;
-
[show abstract]
[hide abstract]
ABSTRACT: Source-controlled routing has been proposed as a way to improve flexibility
of future network architectures, as well as simplifying the data plane.
However, if a packet specifies its path, this precludes fast local re-routing
within the network. We propose SlickPackets, a novel solution that allows
packets to slip around failures by specifying alternate paths in their headers,
in the form of compactly-encoded directed acyclic graphs. We show that this can
be accomplished with reasonably small packet headers for real network
topologies, and results in responsiveness to failures that is competitive with
past approaches that require much more state within the network. Our approach
thus enables fast failure response while preserving the benefits of
source-controlled routing.
01/2012;
-
Proceedings of the ACM SIGCOMM 2011 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, Toronto, ON, Canada, August 15-19, 2011; 01/2011
-
2010 International Conference on Distributed Computing Systems, ICDCS 2010, Genova, Italy, June 21-25, 2010; 01/2010
-
Computer Communication Review. 01/2010; 40:66-71.
-
Proceedings of the ACM SIGCOMM 2007 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, Kyoto, Japan, August 27-31, 2007; 01/2007
-
Proceedings of the ACM SIGCOMM 2006 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, Pisa, Italy, September 11-15, 2006; 01/2006
-
Proceedings of the ACM SIGCOMM 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, Philadelphia, Pennsylvania, USA, August 22-26, 2005; 01/2005
-
Bhaskaran Raman,
Sharad Agarwal,
Yan Chen, Matthew Caesar,
Weidong Cui,
Kevin Lai,
Tal Lavian,
Sridhar Machiraju,
Z. Morley Mao,
George Porter,
Timothy Roscoe,
Mukund Seshadri,
Jimmy Shih,
Keith Sklower,
Lakshminarayanan Subramanian,
Takashi Suzuki,
Shelley Zhuang,
Anthony D. Joseph,
Y H. Katz,
Ion Stoica
[show abstract]
[hide abstract]
ABSTRACT: Services are capabilities that enable applications and are of crucial importance to pervasive computing in next-generation networks. Service Composition is the construction of complex services from primitive ones; thus enabling rapid and flexible creation of new services. The presence of multiple independent service providers poses new and significant challenges. Managing trust across providers and verifying the performance of the components in composition become essential issues. Adapting the composed service to network and user dynamics by choosing service providers and instances is yet another challenge. In SAHARA, we are developing a comprehensive architecture for the creation, placement, and management of services for composition across independent providers. In this paper, we present a layered reference model for composition based on a classification of different kinds of composition.We then discuss the different overarching mechanisms necessary for the successful deployment of such an architecture through a variety of case-studies involving composition.
06/2002;
-
Bhaskaran Raman,
Sharad Agarwal,
Yan Chen, Matthew Caesar,
Weidong Cui,
Per Johansson,
Kevin Lai,
Tal Lavian,
Sridhar Machiraju,
Zhuoqing Morley Mao, [......],
Timothy Roscoe,
Mukund Seshadri,
Jimmy S. Shih,
Keith Sklower,
Lakshminarayanan Subramanian,
Takashi Suzuki,
Shelley Zhuang,
Anthony D. Joseph,
Randy H. Katz,
Ion Stoica
Pervasive Computing, First International Conference, Pervasive 2002, Zürich, Switzerland, August 26-28, 2002, Proceedings; 01/2002
-
[show abstract]
[hide abstract]
ABSTRACT: Route instability is an important contributor to data plane unreliability on the Internet, and also incurs load on the control plane of routers. In this paper, we study how route selection schemes can avoid these changes in routes. Specifically, we characterize the tradeoffs between interruption rate, our measure of stability; availability of routes; and deviation from the network operator's preferred routes. We develop algorithms to lower bound the feasible points in the tradeoff spaces between these three cost metrics. We also propose a new approach, Stable Route Selection (SRS), which uses flexibility in route selection to improve stability without sacrificing availability, and with a controlled amount of deviation. Our large-scale simulation results show that SRS can significantly improve stability while deviating only a small amount from preferred routes. We implement our protocol in a software router, Quagga, and confirm in cluster deployment that SRS's gains in route stability translate to improved reliability in the data plane. Finally, we evaluate SRS under direct feeds of route update traffic from Internet routers. In this case, we observe less improvement, but SRS can still improve stability when multiple disjoint paths are available. published or submitted for publication
-
-
[show abstract]
[hide abstract]
ABSTRACT: Route instability incurs significant load core routers and is widely recognized as a major contributor to data plane unre-liability on the Internet. Route flap damping provides some protection against instability, but introduces pathologies and reduces availability. Concerns about the scalability of the routing system and the increasing prevalence of real-time applications have prompted a renewed interest in stability. We believe it is time for a principled approach to stabilizing Internet rout-ing. This paper takes a step towards that goal by character-izing the tradeoff between stability and availability, and be-tween stability and deviation from preferred routes. We pro-pose a new approach, Stable Route Selection (SRS), which uses flexibility in route selection to improve stability without sacrificing availability. Our extensive simulation and exper-imental results show that SRS improves stability and data plane reliability while deviating only a small amount from preferred routes.
-
[show abstract]
[hide abstract]
ABSTRACT: Internet routing is plagued with several problems today, including chronic instabilities, convergence problems, and misconfiguration of routers. We believe that a first step to-wards making the Internet robust to these problems is by developing a systematic methodology for analyzing rout-ing changes and inferring why they happen and where they originate. In this paper, we motivate the need as well as de-scribe the design of an Internet health monitoring system that identifies the source of routing instabilities purely by passively observing routing updates from different vantage points. We believe such a system could be used to contin-uously infer the state of the network. Such inferences may then be used offline for network performance monitoring and troubleshooting, or online to improve path selection and damping of instability.
-
[show abstract]
[hide abstract]
ABSTRACT: The lack of a good understanding of the dynamics of interdomain rout-ing has made efforts to address BGP's shortcomings a black art. To gain more insight into these dynamics, we need to answer two ques-tions: What is the cause of a routing change? Where does a routing change originate? This paper proposes the design of a BGP health inferencing system that answers these questions by observing routing updates from multiple vantage points and inferring the type and loca-tion of an event that triggers a routing change. To build such a system, we solve two basic problems: (a) classify route updates into groups of correlated routing changes where all route updates in a group are triggered by the same event, (b) given the set of routing changes for an event, determine the location and the cause of the event. By analyzing route updates from Routeviews and RIPE for over ¢ ¤ £ months, we found that our approach can pinpoint the location where an update is triggered to a single inter-AS link in over 70% of observed updates. We found that the majority of updates are caused by a rela-tively small number of unstable links. In addition, 25% of prefixes are persistently unstable, causing 20% of all updates observed. Routes through the Internet core usually reconverge quickly after events, while an event taking place at the network edge is 9 times more likely to cause a long-term route change. We validated our approach by show-ing it can detect a variety of well-known events, namely: (a) session re-sets recorded in the NANOG mailing list; (b) routing problems within ISPs; (c) location of BGP Beacons. In addition, our inference method-ology is able to detect several routing problems not publicly known. In summary, we believe that our health inference system is a first step towards forming a better understanding of inter-domain routing dy-namics.
-
[show abstract]
[hide abstract]
ABSTRACT: The Border Gateway Protocol (BGP) is the inter-domain routing protocol in the Internet that allows each autonomous system (AS) to select routes to the destinations based on locally determined policies. It has been shown that the policy autonomy exercised by ASes may result in persistent oscillations in BGP. Current solutions either rely on globally consistent policy assignments (which are hard to achieve in a distributed fashion), or require significant deviations from locally assigned policies (which reduce flexibility). In this paper, we take a different approach. Namely, we propose multipath routing to resolve the conflict with policy autonomy and system stability. We design an algorithm STABLE PATH(S) ASSIGNMENT (SPA), that provably detects persistent oscillations and eliminates these oscillations by assigning multiple paths to some ASes in the network. We design a distributed protocol for SPA and present tight bounds on the number of paths assigned by the algorithm to the ASes. Using simulations on the AS graph, we show that SPA assigns at most two paths to any AS in the network (in 99.9% of the instances), while assigning a single path in absence of persistent oscillations. Our evaluation results suggest that SPA can effectively detect networks that have a stable state but can potentially face persistent oscillations, and assigns a single path to the ASes in such networks.
-
[show abstract]
[hide abstract]
ABSTRACT: Today, we lack a clear understanding of the dynamics of the Border Gateway Protocol (BGP) and this has largely restricted our ability to address BGP's shortcomings. To gain more in-sight into BGP's dynamics, this paper proposes the design of a BGP health inferencing system that localizes the root causes of routing changes. Specifically, the inference system addresses two questions: What is the cause of a routing change? Where does a routing change originate? The inference system corre-lates routing updates across multiple vantage points to narrow down the suspect set of AS's that might have triggered rout-ing changes. Our methodology is primarily targeted towards analyzing events affecting relatively stable prefixes (compos-ing roughly 80% of the routing table), which are known to be the most popular destinations of Internet traffic. For 70% of observed updates to these prefixes, our approach can pinpoint the location of origin to a single inter-AS link. We analytically and empirically argue correctness of several key steps of our methodology and additionally show that our technique can cor-rectly pinpoint the source of several well-known/ documented routing events.
-
[show abstract]
[hide abstract]
ABSTRACT: In this paper we present a framework for re-source allocation based on auctions. We leverage (i) applica-tion awareness to achieve the performance metrics desired by the application (ii) prediction to preallocate resources based on expected demand, and (iii) a control channel to ex-change information between the client and the allocator to improve resource utilization. We design a scheme that is (i) general: can be used in many application contexts, (ii) flexi-ble: can be used under a variety of workloads, (iii) efficient: provides high resource utilization with low overhead, (iv) responsive: adapts quickly to client demand, and (v) fair: provides proportional fairness to client requests. Simulation results show that the simple auction-based allocation tech-niques we propose can be used to achieve these goals with excellent performance. High resource utilization, low de-lay, and proportional fairness in terms of delay and resource allocation can be observed even under bursty application workloads. Furthermore, the simple prediction techniques we propose can greatly improve responsiveness. However, we note that misprediction of future load can cause a signif-icant decrease in system utilization. We compare our allo-cation techniques with lottery scheduling and note that our auction-based resource allocation scheme performs better under most application request workloads.
-
[show abstract]
[hide abstract]
ABSTRACT: This paper advocates a different approach to reduce routing convergence—side-stepping the problem by avoiding it in the first place! Rather than recomputing paths after temporary topology changes, we argue for a separation of timescale between offline computation of multiple diverse paths and online spreading of load over these paths. We believe decoupling failure recovery from path computation leads to networks that are inherently more efficient, more scalable, and easier to manage.