# ACM SIGMETRICS Performance Evaluation Review

Published by Association for Computing Machinery

Online ISSN: 0163-5999

Published by Association for Computing Machinery

Online ISSN: 0163-5999

Publications

Conference Paper

Congestion control in TCP/AQM networks is expected to perform well for a wide-range of conditions, but recent advances in modeling and analysis indicate that present AQM (active queue management) schemes need an extra dose of adaptability to cope. The paper answers the call and proposes a self-tuning structure wherein AQM parameters are automatically tuned in response to on-line estimation of link capacity and traffic load. This approach is applicable to any AQM scheme that is parameterizable in terms of link capacity and TCP load. We describe this self-tuning structure, illustrate its application to PI (proportional-integral) and RED (random early detection) AQMs, provide stability analysis, and conduct ns simulations to compare with both fixed AQM schemes and the recently proposed adaptive RED.

…

Conference Paper

Fine-grained network measurement requires routers and switches to update large arrays of counters at very high link speed (e.g. 40 Gbps). A naive algorithm needs an infeasible amount of SRAM to store both the counters and a flow-to-counter association rule, so that arriving packets can update corresponding counters at link speed. This has made accurate per-flow measurement complex and expensive, and motivated approximate methods that detect and measure only the large flows.
This paper revisits the problem of accurate per-flow measurement. We present a counter architecture, called Counter Braids, inspired by sparse random graph codes. In a nutshell, Counter Braids "compresses while counting". It solves the central problems (counter space and flow-to-counter association) of per-flow measurement by "braiding" a hierarchy of counters with random graphs. Braiding results in drastic space reduction by sharing counters among flows; and using random graphs generated on-the-fly with hash functions avoids the storage of flow-to-counter association.
The Counter Braids architecture is optimal (albeit with a complex decoder) as it achieves the maximum compression rate asymptotically. For implementation, we present a low-complexity message passing decoding algorithm, which can recover flow sizes with essentially zero error. Evaluation on Internet traces demonstrates that almost all flow sizes are recovered exactly with only a few bits of counter space per flow.

…

Conference Paper

Storage device performance prediction is a key element of self-managed storage systems. The paper explores the application of a machine learning tool, CART (classification and regression trees) models, to storage device modeling. Our approach predicts a device's performance as a function of input workloads, requiring no knowledge of the device internals. We propose two uses of CART models: one that predicts per-request response times (and then derives aggregate values); one that predicts aggregate values directly from workload characteristics. After being trained on the device in question, both provide accurate black-box models across a range of test traces from real environments. Experiments show that these models predict the average and 90th percentile response time with a relative error as low as 19%, when the training workloads are similar to the testing workloads, and interpolate well across different workloads.

…

Article

Scientific codes are usually parallelized by partitioning a grid among processors. To achieve top performance it is necessary to partition the grid so as to balance workload and minimize communication/synchronization costs. This problem is particularly acute when the grid is irregular, changes over the course of the computation, and is not known until load time. Critical mapping and remapping decisions rest on the ability to accurately predict performance, given a description of a grid and its partition. This paper discusses one approach to this problem, and illustrates its use on a one-dimensional fluids code. The models constructed are shown to be accurate, and are used to find optimal remapping schedules.

…

Article

One of the key issues in providing end-to-end quality-of-service (QoS) guarantees in packet networks is how to determine a feasible path that satisfies a number of QoS constraints. For two or more additive constraints, the problem of finding a feasible path is NP-complete that cannot be exactly solved in polynomial time. Accordingly, several heuristics and approximation algorithms have been proposed for this problem. Many of these algorithms suffer from either excessive computational cost or low performance. In this paper, we provide an efficient approximation algorithm for finding a path subject to two additive constraints. The worst-case computational complexity of this algorithm is within a logarithmic number of calls to Dijkstra's shortest path algorithm. Its average complexity is even much lower than that, as demonstrated by simulation experiments. The performance of the proposed algorithm is justified via theoretical bounds that are provided for the optimal version of the path selection problem. To achieve further performance improvement, several extensions to the basic algorithm are also provided at very low computational cost. Extensive simulations are used to demonstrate the high performance of the proposed algorithm and to contrast it with other path selection algorithms.

…

Article

Most reliability analysis techniques and tools assume that a system is used for a mission consisting of a single phase. However, multiple phases are natural in many missions. The failure rates of components, system configuration, and success criteria may vary from phase to phase. In addition, the duration of a phase may be deterministic or random. Recently, several researchers have addressed the problem of reliability analysis of such systems using a variety of methods. A new technique for phased-mission system reliability analysis based on Boolean algebraic methods is described. Our technique is computationally efficient and is applicable to a large class of systems for which the failure criterion in each phase can be expressed as a fault tree (or an equivalent representation). Our technique avoids state space explosion that commonly plague Markov chain-based analysis. A phase algebra to account for the effects of variable configurations and success criteria from phase to phase was developed. Our technique yields exact (as opposed to approximate) results. The use of our technique was demonstrated by means of an example and present numerical results to show the effects of mission phases on the system reliability.

…

Article

There is a growing interest in discovery of internet topology at the interface level. A new generation of highly distributed measurement systems is currently being deployed. Unfortunately, the research community has not examined the problem of how to perform such measurements efficiently and in a network-friendly manner. In this paper we make two contributions toward that end. First, we show that standard topology discovery methods (e.g., skitter) are quite inefficient, repeatedly probing the same interfaces. This is a concern, because when scaled up, such methods will generate so much traffic that they will begin to resemble DDoS attacks. We measure two kinds of redundancy in probing (intra- and inter-monitor) and show that both kinds are important. We show that straightforward approaches to addressing these two kinds of redundancy must take opposite tacks, and are thus fundamentally in conflict. Our second contribution is to propose and evaluate Doubletree, an algorithm that reduces both types of redundancy simultaneously on routers and end systems. The key ideas are to exploit the tree-like structure of routes to and from a single point in order to guide when to stop probing, and to probe each path by starting near its midpoint. Our results show that Doubletree can reduce both types of measurement load on the network dramatically, while permitting discovery of nearly the same set of nodes and links. We then show how to enable efficient communication between monitors through the use of Bloom filters.

…

Article

Full-duplex communication has the potential to substantially increase the
throughput in wireless networks. However, the benefits of full-duplex are still
not well understood. In this paper, we characterize the full-duplex rate gains
in both single-channel and multi-channel use cases. For the single-channel
case, we quantify the rate gain as a function of the remaining
self-interference and SNR values. We also provide a sufficient condition under
which the sum of uplink and downlink rates on a full-duplex channel is concave
in the transmission power levels. Building on these results, we consider the
multi-channel case. For that case, we introduce a new realistic model of a
small form-factor (e.g., smartphone) full-duplex receiver and demonstrate its
accuracy via measurements. We study the problem of jointly allocating power
levels to different channels and selecting the frequency of maximum
self-interference suppression, where the objective is maximizing the sum of the
rates over uplink and downlink OFDM channels. We develop a polynomial time
algorithm which is nearly optimal under very mild restrictions. To reduce the
running time, we develop an efficient nearly-optimal algorithm under the high
SINR approximation. Finally, we demonstrate via numerical evaluations the
capacity gains in the different use cases and obtain insights into the impact
of the remaining self-interference and wireless channel states on the
performance.

…

Article

Online services routinely mine user data to predict user preferences, make
recommendations, and place targeted ads. Recent research has demonstrated that
several private user attributes (such as political affiliation, sexual
orientation, and gender) can be inferred from such data. Can a
privacy-conscious user benefit from personalization while simultaneously
protecting her private attributes? We study this question in the context of a
rating prediction service based on matrix factorization. We construct a
protocol of interactions between the service and users that has remarkable
optimality properties: it is privacy-preserving, in that no inference algorithm
can succeed in inferring a user's private attribute with a probability better
than random guessing; it has maximal accuracy, in that no other
privacy-preserving protocol improves rating prediction; and, finally, it
involves a minimal disclosure, as the prediction accuracy strictly decreases
when the service reveals less information. We extensively evaluate our protocol
using several rating datasets, demonstrating that it successfully blocks the
inference of gender, age and political affiliation, while incurring less than
5% decrease in the accuracy of rating prediction.

…

Article

Twitter is one of the largest social networks using exclusively directed
links among accounts. This makes the Twitter social graph much closer to the
social graph supporting real life communications than, for instance, Facebook.
Therefore, understanding the structure of the Twitter social graph is
interesting not only for computer scientists, but also for researchers in other
fields, such as sociologists. However, little is known about how the
information propagation in Twitter is constrained by its inner structure. In
this paper, we present an in-depth study of the macroscopic structure of the
Twitter social graph unveiling the highways on which tweets propagate, the
specific user activity associated with each component of this macroscopic
structure, and the evolution of this macroscopic structure with time for the
past 6 years. For this study, we crawled Twitter to retrieve all accounts and
all social relationships (follow links) among accounts; the crawl completed in
July 2012 with 505 million accounts interconnected by 23 billion links. Then,
we present a methodology to unveil the macroscopic structure of the Twitter
social graph. This macroscopic structure consists of 8 components defined by
their connectivity characteristics. Each component group users with a specific
usage of Twitter. For instance, we identified components gathering together
spammers, or celebrities. Finally, we present a method to approximate the
macroscopic structure of the Twitter social graph in the past, validate this
method using old datasets, and discuss the evolution of the macroscopic
structure of the Twitter social graph during the past 6 years.

…

Article

We consider a system of parallel queues where tasks are assigned (dispatched)
to one of the available servers upon arrival. The dispatching decision is based
on the full state information, i.e., on the sizes of the new and existing jobs.
We are interested in minimizing the so-called mean slowdown criterion
corresponding to the mean of the sojourn time divided by the processing time.
Assuming no new jobs arrive, the shortest-processing-time-product (SPTP)
schedule is known to minimize the slowdown of the existing jobs. The main
contribution of this paper is three-fold: 1) To show the optimality of SPTP
with respect to slowdown in a single server queue under Poisson arrivals; 2) to
derive the so-called size-aware value functions for
M/G/1-FIFO/LIFO/SPTP/SPT/SRPT with general holding costs of which the slowdown
criterion is a special case; and 3) to utilize the value functions to derive
efficient dispatching policies so as to minimize the mean slowdown in a
heterogeneous server system. The derived policies offer a significantly better
performance than e.g., the size-aware-task-assignment with equal load (SITA-E)
and least-work-left (LWL) policies.

…

Article

Recent studies show that a large fraction of Internet traffic is originated by Content Providers (CPs) such as content distribution networks and hyper-giants. To cope with the increasing demand for content, CPs deploy massively distributed server infrastructures. Thus, content is available in many network locations and can be downloaded by traversing different paths in a network. Despite the prominent server location and path diversity, the decisions on how to map users to servers by CPs and how to perform traffic engineering by ISPs, are independent. This leads to a lose-lose situation as CPs are not aware about the network bottlenecks nor the location of end-users, and the ISPs struggle to cope with rapid traffic shifts caused by the dynamic CP server selection process.
In this paper we propose and evaluate Content-aware Traffic Engineering (CaTE), which dynamically adapts the traffic demand for content hosted on CPs by utilizing ISP network information and end-user location during the server selection process. This leads to a win-win situation because CPs are able to enhance their end-user to server mapping and ISPs gain the ability to partially influence the traffic demands in their networks. Indeed, our results using traces from a Tier-1 ISP show that a number of network metrics can be improved when utilizing CaTE.

…

Article

A well-known approach to intradomain traffic engineering consists in finding the set of link weights that minimizes a network-wide objective function for a given intradomain traffic matrix. This approach is inadequate because it ignores a potential impact on interdomain routing due to hot-potato routing policies. This may result in changes in the intradomain traffic matrix that have not been anticipated by the link weights optimizer, possibly leading to degraded network performance.
We propose a BGP-aware link weights optimization method that takes these hot-potato effects into account. This method uses the interdomain traffic matrix and other available BGP data, to extend the intradomain topology with external virtual nodes and links, on which all the well-tuned heuristics of a classical link weights optimizer can be applied. Our method can also optimize the traffic on the interdomain peering links.

…

Conference Paper

We describe a new, non-FCFS policy to schedule parallel jobs on systems that may be part of a computationalgrid . Our algorithm
continuously monitors the system (i.e., the intensity of incoming jobs and variability of their resource demands), and adapts
its scheduling parameters according to workload fluctuations. The proposed policy is based on backfilling, which reduces resource
fragmentation by executing jobs in an order different than their arrivalwit hout delaying certain previously submitted jobs.
We maintain multiple job queues that effectively separate jobs according to their projected execution time. Our policy supports
different job priorities and job reservations, making it appropriate for scheduling jobs on parallel systems that are part
of a computational grid. Detailed performance comparisons via simulation using traces from the Parallel Workload Archive indicate
that the proposed policy consistently outperforms traditional backfilling.

…

Article

Graph sampling via crawling has been actively considered as a generic and
important tool for collecting uniform node samples so as to consistently
estimate and uncover various characteristics of complex networks. The so-called
simple random walk with re-weighting (SRW-rw) and Metropolis-Hastings (MH)
algorithm have been popular in the literature for such unbiased graph sampling.
However, an unavoidable downside of their core random walks -- slow diffusion
over the space, can cause poor estimation accuracy. In this paper, we propose
non-backtracking random walk with re-weighting (NBRW-rw) and MH algorithm with
delayed acceptance (MHDA) which are theoretically guaranteed to achieve, at
almost no additional cost, not only unbiased graph sampling but also higher
efficiency (smaller asymptotic variance of the resulting unbiased estimators)
than the SRW-rw and the MH algorithm, respectively. In particular, a remarkable
feature of the MHDA is its applicability for any non-uniform node sampling like
the MH algorithm, but ensuring better sampling efficiency than the MH
algorithm. We also provide simulation results to confirm our theoretical
findings.

…

Article

A substantial amount of work has recently gone into localizing BitTorrent
traffic within an ISP in order to avoid excessive and often times unnecessary
transit costs. Several architectures and systems have been proposed and the
initial results from specific ISPs and a few torrents have been encouraging. In
this work we attempt to deepen and scale our understanding of locality and its
potential. Looking at specific ISPs, we consider tens of thousands of
concurrent torrents, and thus capture ISP-wide implications that cannot be
appreciated by looking at only a handful of torrents. Secondly, we go beyond
individual case studies and present results for the top 100 ISPs in terms of
number of users represented in our dataset of up to 40K torrents involving more
than 3.9M concurrent peers and more than 20M in the course of a day spread in
11K ASes. We develop scalable methodologies that permit us to process this huge
dataset and answer questions such as: "\emph{what is the minimum and the
maximum transit traffic reduction across hundreds of ISPs?}", "\emph{what are
the win-win boundaries for ISPs and their users?}", "\emph{what is the maximum
amount of transit traffic that can be localized without requiring fine-grained
control of inter-AS overlay connections?}", "\emph{what is the impact to
transit traffic from upgrades of residential broadband speeds?}".

…

Article

Peer-to-peer protocols play an increasingly instrumental role in Internet content distribution. It is therefore important to gain a complete understanding of how these protocols behave in practice and how their operating parameters affect overall system performance. This paper presents the first detailed experimental investigation of the peer selection strategy in the popular BitTorrent protocol. By observing more than 40 nodes in instrumented private torrents, we validate three protocol properties that, though believed to hold, have not been previously demonstrated experimentally: the clustering of similar-bandwidth peers, the effectiveness of BitTorrent's sharing incentives, and the peers' high uplink utilization. In addition, we observe that BitTorrent's modified choking algorithm in seed state provides uniform service to all peers, and that an underprovisioned initial seed leads to absence of peer clustering and less effective sharing incentives. Based on our results, we provide guidelines for seed provisioning by content providers, and discuss a tracker protocol extension that addresses an identified limitation of the protocol.

…

Article

The practicality of the stochastic network calculus (SNC) is often questioned
on grounds of potential looseness of its performance bounds. In this paper it
is uncovered that for bursty arrival processes (specifically Markov-Modulated
On-Off (MMOO)), whose amenability to \textit{per-flow} analysis is typically
proclaimed as a highlight of SNC, the bounds can unfortunately indeed be very
loose (e.g., by several orders of magnitude off). In response to this uncovered
weakness of SNC, the (Standard) per-flow bounds are herein improved by deriving
a general sample-path bound, using martingale based techniques, which
accommodates FIFO, SP, EDF, and GPS scheduling. The obtained (Martingale)
bounds gain an exponential decay factor of ${\mathcal{O}}(e^{-\alpha n})$ in
the number of flows $n$. Moreover, numerical comparisons against simulations
show that the Martingale bounds are remarkably accurate for FIFO, SP, and EDF
scheduling; for GPS scheduling, although the Martingale bounds substantially
improve the Standard bounds, they are numerically loose, demanding for
improvements in the core SNC analysis of GPS.

…

Article

The non-preemptive priority queueing with a finite buffer is considered. We introduce a randomized push-out buffer management mechanism which allows to control very efficiently the loss probability of priority packets. The packet loss probabilities for priority and non-priority traffic are calculated using the generating function approach. In the particular case of the standard non-randomized push-out scheme we obtain explicit analytic expressions. The theoretical results are illustrated by numerical examples. The randomized push-out scheme is compared with the threshold based push-out scheme. It turns out that the former is much easier to tune than the latter. The proposed scheme can be applied to the Differentiated Services of the Internet.

…

Article

TTL caching models have recently regained significant research interest,
largely due to their ability to fit popular caching policies such as LRU. This
paper advances the state-of-the-art analysis of TTL-based cache networks by
developing two exact methods with orthogonal generality and computational
complexity. The first method generalizes existing results for line networks
under renewal requests to the broad class of caching policies whereby evictions
are driven by stopping times. The obtained results are further generalized,
using the second method, to feedforward networks with Markov arrival processes
(MAP) requests. MAPs are particularly suitable for non-line networks because
they are closed not only under superposition and splitting, as known, but also
under input-output caching operations as proven herein for phase-type TTL
distributions. The crucial benefit of the two closure properties is that they
jointly enable the first exact analysis of feedforward networks of TTL caches
in great generality.

…

Article

In this note, we present preliminary results on the use of "network calculus"
for parallel processing systems, specifically MapReduce. We also numerically
evaluate the "generalized" (strong) stochastic burstiness bound based on
publicly posted data describing an actual MapReduce workload of a Facebook
datacenter.

…

Article

Recent studies on AS-level Internet connectivity have attracted considerable attention. These studies have exclusively relied on BGP data from the Oregon route-views [University of Oregon Route Views Project, http://www.routeviews.org] to derive some unexpected and intriguing results. The Oregon route-views data sets reflect AS peering relationships, as reported by BGP, seen from a handful of vantage points in the global Internet. The possibility that these data sets may provide only a very sketchy picture of the complete inter-AS connectivity of the Internet has received little scrutiny. By augmenting the Oregon route-views data with BGP summary information from a large number of Internet Looking Glass sites and with routing policy information from Internet Routing Registry (IRR) databases, we find that (1) a significant number of existing AS peering relationships remain hidden from most BGP routing tables, (2) the AS peering relationships with tier-1 ASs are in general more easily observed than those with non-tier-1 ASs, and (3) there are at least about 40% more AS peering relationships in the Internet than commonly-used BGP-derived AS maps reveal (but only about 4% more ASs). These findings point out the need for continuously questioning the applicability and completeness of data sets at hand when establishing the generality of any particular Internet-specific observation and for assessing its (in)sensitivity to deficiencies in the measurements.

…

Article

Data centers have emerged as promising resources for demand response,
particularly for emergency demand response (EDR), which saves the power grid
from incurring blackouts during emergency situations. However, currently, data
centers typically participate in EDR by turning on backup (diesel) generators,
which is both expensive and environmentally unfriendly. In this paper, we focus
on "greening" demand response in multi-tenant data centers, i.e., colocation
data centers, by designing a pricing mechanism through which the data center
operator can efficiently extract load reductions from tenants during emergency
periods to fulfill energy reduction requirement for EDR. In particular, we
propose a pricing mechanism for both mandatory and voluntary EDR programs,
ColoEDR, that is based on parameterized supply function bidding and provides
provably near-optimal efficiency guarantees, both when tenants are price-taking
and when they are price-anticipating. In addition to analytic results, we
extend the literature on supply function mechanism design, and evaluate ColoEDR
using trace-based simulation studies. These validate the efficiency analysis
and conclude that the pricing mechanism is both beneficial to the environment
and to the data center operator (by decreasing the need for backup diesel
generation), while also aiding tenants (by providing payments for load
reductions).

…

Article

Modern distributed storage systems offer large capacity to satisfy the
exponentially increasing need of storage space. They often use erasure codes to
protect against disk and node failures to increase reliability, while trying to
meet the latency requirements of the applications and clients. This paper
provides an insightful upper bound on the average service delay of such
erasure-coded storage with arbitrary service time distribution and consisting
of multiple files. Not only does the result supersede known delay bounds that
only work for a single file, it also enables a novel problem of joint latency
and storage cost minimization over three dimensions: selecting the erasure
code, placement of encoded chunks, and optimizing scheduling policy. The
problem is efficiently solved via the computation of a sequence of convex
approximations with provable convergence. We further prototype our solution in
an open-source, cloud storage deployment over three geographically distributed
data centers. Experimental results validate our theoretical delay analysis and
show significant latency reduction, providing valuable insights into the
proposed latency-cost tradeoff in erasure-coded storage.

…

Article

Since the electricity bill of a data center constitutes a significant portion
of its overall operational costs, reducing this has become important. We
investigate cost reduction opportunities that arise by the use of uninterrupted
power supply (UPS) units as energy storage devices. This represents a deviation
from the usual use of these devices as mere transitional fail-over mechanisms
between utility and captive sources such as diesel generators. We consider the
problem of opportunistically using these devices to reduce the time average
electric utility bill in a data center. Using the technique of Lyapunov
optimization, we develop an online control algorithm that can optimally exploit
these devices to minimize the time average cost. This algorithm operates
without any knowledge of the statistics of the workload or electricity cost
processes, making it attractive in the presence of workload and pricing
uncertainties. An interesting feature of our algorithm is that its deviation
from optimality reduces as the storage capacity is increased. Our work opens up
a new area in data center power management.

…

Article

Recently several CSMA algorithms based on the Glauber dynamics model have
been proposed for multihop wireless scheduling, as viable solutions to achieve
the throughput optimality, yet are simple to implement. However, their delay
performances still remain unsatisfactory, mainly due to the nature of the
underlying Markov chains that imposes a fundamental constraint on how the link
state can evolve over time. In this paper, we propose a new approach toward
better queueing and delay performance, based on our observation that the
algorithm needs not be Markovian, as long as it can be implemented in a
distributed manner, achieve the same throughput optimality, while offering far
better delay performance for general network topologies. Our approach hinges
upon utilizing past state information observed by local link and then
constructing a high-order Markov chain for the evolution of the feasible link
schedules. We show in theory and simulation that our proposed algorithm, named
delayed CSMA, adds virtually no additional overhead onto the existing
CSMA-based algorithms, achieves the throughput optimality under the usual
choice of link weight as a function of local queue length, and also provides
much better delay performance by effectively `de-correlating' the link state
process (thus removing link starvation) under any arbitrary network topology.
From our extensive simulations we observe that the delay under our algorithm
can be often reduced by a factor of 20 over a wide range of scenarios, compared
to the standard Glauber-dynamics-based CSMA algorithm.

…

Article

In this paper we study the behavior of a continuous time random walk (CTRW)
on a stationary and ergodic time varying dynamic graph. We establish conditions
under which the CTRW is a stationary and ergodic process. In general, the
stationary distribution of the walker depends on the walker rate and is
difficult to characterize. However, we characterize the stationary distribution
in the following cases: i) the walker rate is significantly larger or smaller
than the rate in which the graph changes (time-scale separation), ii) the
walker rate is proportional to the degree of the node that it resides on
(coupled dynamics), and iii) the degrees of node belonging to the same
connected component are identical (structural constraints). We provide examples
that illustrate our theoretical findings.

…

Article

Motivated by emerging big streaming data processing paradigms (e.g., Twitter
Storm, Streaming MapReduce), we investigate the problem of scheduling graphs
over a large cluster of servers. Each graph is a job, where nodes represent
compute tasks and edges indicate data-flows between these compute tasks. Jobs
(graphs) arrive randomly over time, and upon completion, leave the system. When
a job arrives, the scheduler needs to partition the graph and distribute it
over the servers to satisfy load balancing and cost considerations.
Specifically, neighboring compute tasks in the graph that are mapped to
different servers incur load on the network; thus a mapping of the jobs among
the servers incurs a cost that is proportional to the number of "broken edges".
We propose a low complexity randomized scheduling algorithm that, without
service preemptions, stabilizes the system with graph arrivals/departures; more
importantly, it allows a smooth trade-off between minimizing average
partitioning cost and average queue lengths. Interestingly, to avoid service
preemptions, our approach does not rely on a Gibbs sampler; instead, we show
that the corresponding limiting invariant measure has an interpretation
stemming from a loss system.

…

Article

When a company migrates to cloud storage, the way back is neither fast nor cheap. The company is then locked up in the storage contract and exposed to upward market prices, which reduce the company’s profit and may even bring it below zero. We propose a protection means based on an insurance contract, by which the cloud purchaser is indem- nified when the current storage price exceeds a pre-defined threshold. By applying the financial options theory, we pro- vide a formula for the insurance price (the premium). By using historical data on market prices for disks, we apply the formula in realistic scenarios. We show that the pre- mium grows nearly quadratically with the duration of the coverage period as long as this is below one year, but grows more slowly, though faster than linearly, over longer cover- age periods.

…

Article

Given a set of pairwise comparisons, the classical ranking problem computes a
single ranking that best represents the preferences of all users. In this
paper, we study the problem of inferring individual preferences, arising in the
context of making personalized recommendations. In particular, we assume that
there are $n$ users of $r$ types; users of the same type provide similar
pairwise comparisons for $m$ items according to the Bradley-Terry model. We
propose an efficient algorithm that accurately estimates the individual
preferences for almost all users, if there are $r \max \{m, n\}\log m \log^2 n$
pairwise comparisons per type, which is near optimal in sample complexity when
$r$ only grows logarithmically with $m$ or $n$. Our algorithm has three steps:
%first, for each user, project its $\binom{m}{2}$-dimensional vector of
pairwise comparisons %onto an $m$-dimensional linear subspace to get the
\nb{so-called: Bruce suggests delete it and change the whole sentence to:
first, for each user, compute the \emph{net-win} vector which is a projection
of its $\binom{m}{2}$-dimensional vector of pairwise comparisons onto an
$m$-dimensional linear subspace; second, cluster the users based on the net-win
vectors; third, estimate a single preference for each cluster separately. The
net-win vectors are much less noisy than the high dimensional vectors of
pairwise comparisons and clustering is more accurate after the projection as
confirmed by numerical experiments. Moreover, we show that, when a cluster is
only approximately correct, the maximum likelihood estimation for the
Bradley-Terry model is still close to the true preference.

…

Article

Microgrids represent an emerging paradigm of future electric power systems
that can utilize both distributed and centralized generations. Two recent
trends in microgrids are the integration of local renewable energy sources
(such as wind farms) and the use of co-generation (i.e., to supply both
electricity and heat). However, these trends also bring unprecedented
challenges to the design of intelligent control strategies for microgrids.
Traditional generation scheduling paradigms rely on perfect prediction of
future electricity supply and demand. They are no longer applicable to
microgrids with unpredictable renewable energy supply and with co-generation
(that needs to consider both electricity and heat demand). In this paper, we
study online algorithms for the microgrid generation scheduling problem with
intermittent renewable energy sources and co-generation, with the goal of
maximizing the cost-savings with local generation. Based on the insights from
the structure of the offline optimal solution, we propose a class of
competitive online algorithms, called CHASE (Competitive Heuristic Algorithm
for Scheduling Energy-generation), that track the offline optimal in an online
fashion. Under typical settings, we show that CHASE achieves the best
competitive ratio among all deterministic online algorithms, and the ratio is
no larger than a small constant 3.

…

Article

There has appeared in the literature a great number of metrics that attempt to measure the effort or complexity in developing and understanding software(1). There have also been several attempts to independently validate these measures on data from different organizations gathered by different people(2). These metrics have many purposes. They can be used to evaluate the software development process or the software product. They can be used to estimate the cost and quality of the product. They can also be used during development and evolution of the software to monitor the stability and quality of the product.
Among the most popular metrics have been the software science metrics of Halstead, and the cyclomatic complexity metric of McCabe. One question is whether these metrics actually measure such things as effort and complexity. One measure of effort may be the time required to produce a product. One measure of complexity might be the number of errors made during the development of a product. A second question is how these metrics compare with standard size measures, such as the number of source lines or the number of executable statements, i.e., do they do a better job of predicting the effort or the number of errors? Lastly, how do these metrics relate to each other?

…

Article

Since Tassiulas and Ephremides proposed the maximum weight scheduling
algorithm of throughput-optimality for constrained queueing networks in 1992,
extensive research efforts have been made for resolving its high complexity
issue under various directions. In this paper, we resolve the issue by
developing a generic framework for designing throughput-optimal and
low-complexity scheduling algorithms. Under the framework, an algorithm updates
current schedules via an interaction with a given oracle system that can
generate a solution of a certain discrete optimization problem in a finite
number of interactive queries. Therefore, one can design a variety of
scheduling algorithms under this framework by choosing different oracles, e.g.,
the exhaustive search (ES), the markov chain monte carlo (MCMC), the belief
propagation (BP) and the cutting-plane (CP) algorithms. The complexity of the
resulting algorithm is decided by the number of operations required for an
oracle processing a single query, which is typically very small. Somewhat
surprisingly, we prove that an algorithm using any such oracle is
throughput-optimal for general constrained queueing network models that arise
in the context of emerging large-scale communication networks. In particular,
the `pick-and-compare' algorithms developed by Tassiulas in 1998 and recently
developed queue-based CSMA algorithms can be also understood as special cases
of such algorithms using ES and MCMC oracles, respectively. To our best
knowledge, our result is the first that establishes a rigorous connection
between iterative optimization methods and low-complexity scheduling
algorithms, which we believe provides various future directions and new
insights in both areas.

…

Article

One typical use case of large-scale distributed computing in data centers is
to decompose a computation job into many independent tasks and run them in
parallel on different machines, sometimes known as the "embarrassingly
parallel" computation. For this type of computation, one challenge is that the
time to execute a task for each machine is inherently variable, and the overall
response time is constrained by the execution time of the slowest machine. To
address this issue, system designers introduce task replication, which sends
the same task to multiple machines, and obtains result from the machine that
finishes first. While task replication reduces response time, it usually
increases resource usage. In this work, we propose a theoretical framework to
analyze the trade-off between response time and resource usage. We show that,
while in general, there is a tension between response time and resource usage,
there exist scenarios where replicating tasks judiciously reduces completion
time and resource usage simultaneously. Given the execution time distribution
for machines, we investigate the conditions for a scheduling policy to achieve
optimal performance trade-off, and propose efficient algorithms to search for
optimal or near-optimal scheduling policies. Our analysis gives insights on
when and why replication helps, which can be used to guide scheduler design in
large-scale distributed computing systems.

…

Article

Our model is a constrained homogeneous random walk in a nonnegative orthant Z_+^d. The convergence to stationarity for such a random walk can often be checked by constructing a Lyapunov function. The same Lyapunov function can also be used for computing approximately the stationary distribution of this random walk, using methods developed by Meyn and Tweedie. In this paper we show that, for this type of random walks, computing the stationary probability exactly is an undecidable problem: no algorithm can exist to achieve this task. We then prove that computing large deviation rates for this model is also an undecidable problem. We extend these results to a certain type of queueing systems. The implication of these results is that no useful formulas for computing stationary probabilities and large deviations rates can exist in these systems.

…

Article

Users expect fast and fluid response from today's cloud infrastructure.
Large-scale computing frameworks such as MapReduce divide jobs into many
parallel tasks and execute them on different machines to enable faster
processing. But the tasks on the slowest machines (straggling tasks) become the
bottleneck in the completion of the job. One way to combat the variability in
machine response time, is to add replicas of straggling tasks and wait for one
copy to finish.
In this paper we analyze how task replication strategies can be used to
reduce latency, and their impact on the cost of computing resources. We use
extreme value theory to show that the tail of the execution time distribution
is the key factor in characterizing the trade-off between latency and computing
cost. From this trade-off we can determine which task replication strategies
reduce latency, without a large increase in computing cost. We also propose a
heuristic algorithm to search for the best replication strategies when it is
difficult to fit a simple distribution to model the empirical behavior of task
execution time, and use the proposed analysis techniques. Evaluation of the
heuristic policies on Google Trace data shows a significant latency reduction
compared to the replication strategy used in MapReduce.

…

Article

Switched queueing networks model wireless networks, input queued switches and numerous other networked communications systems. For single-hop networks, we consider a (α,g)-switch policy} which combines the MaxWeight policies with bandwidth sharing networks -- a further well studied model of Internet congestion. We prove the maximum stability property for this class of randomized policies. Thus these policies have the same first order behavior as the MaxWeight policies. However, for multihop networks some of these generalized polices address a number of critical weakness of the MaxWeight/BackPressure policies.
For multihop networks with fixed routing, we consider the Proportional Scheduler (or (1,log)-policy). In this setting, the BackPressure policy is maximum stable, but must maintain a queue for every route-destination, which typically grows rapidly with a network's size. However, this proportionally fair policy only needs to maintain a queue for each outgoing link, which is typically bounded in number. As is common with Internet routing, by maintaining per-link queueing each node only needs to know the next hop for each packet and not its entire route. Further, in contrast to BackPressure, the Proportional Scheduler does not compare downstream queue lengths to determine weights, only local link information is required. This leads to greater potential for decomposed implementations of the policy. Through a reduction argument and an entropy argument, we demonstrate that, whilst maintaining substantially less queueing overhead, the Proportional Scheduler achieves maximum throughput stability.

…

Article

We consider a large-scale service system model motivated by the problem of
efficient placement of virtual machines to physical host machines in a network
cloud, so that the total number of occupied hosts is minimized. Customers of
different types arrive to a system with an infinite number of servers. A server
packing configuration is the vector $k = (k_i)$, where $k_i$ is the number of
type-$i$ customers that the server "contains". Packing constraints are
described by a fixed finite set of allowed configurations. Upon arrival, each
customer is placed into a server immediately, subject to the packing
constraints; the server can be idle or already serving other customers. After
service completion, each customer leaves its server and the system.
It was shown recently that a simple real-time algorithm, called Greedy, is
asymptotically optimal in the sense of minimizing $\sum_k X_k^{1+\alpha}$ in
the stationary regime, as the customer arrival rates grow to infinity. (Here
\alpha >0, and $X_k$ denotes the number of servers with configuration $k$.) In
particular, when parameter \alpha is small, Greedy approximately solves the
problem of minimizing $\sum_k X_k$, the number of occupied hosts. In this paper
we introduce the algorithm called Greedy with sublinear Safety Stocks (GSS),
and show that it asymptotically solves the exact problem of minimizing $\sum_k
X_k$. An important feature of the algorithm is that sublinear safety stocks of
$X_k$ are created automatically - when and where necessary - without having to
determine a priori where they are required. Moreover, we also provide a tight
characterization of the rate of convergence to optimality under GSS. The GSS
algorithm is as simple as Greedy, and uses no more system state information
than Greedy does.

…

Article

Social utility maximization refers to the process of allocating resources in
such a way that the sum of agents' utilities is maximized under the system
constraints. Such allocation arises in several problems in the general area of
communications, including unicast (and multicast multi-rate) service on the
Internet, as well as in applications with (local) public goods, such as power
allocation in wireless networks, spectrum allocation, etc. Mechanisms that
implement such allocations in Nash equilibrium have also been studied but
either they do not possess full implementation property, or are given in a
case-by-case fashion, thus obscuring fundamental understanding of these
problems.
In this paper we propose a unified methodology for creating mechanisms that
fully implement, in Nash equilibria, social utility maximizing functions
arising in various contexts where the constraints are convex. The construction
of the mechanism is done in a systematic way by considering the dual
optimization problem. In addition to the required properties of efficiency and
individual rationality that such mechanisms ought to satisfy, three additional
design goals are the focus of this paper: a) the size of the message space
scaling linearly with the number of agents (even if agents' types are entire
valuation functions), b) allocation being feasible on and off equilibrium, and
c) strong budget balance at equilibrium and also off equilibrium whenever
demand is feasible.

…

Article

Making use of predictions is a crucial, but under-explored, area of online
algorithms. This paper studies a class of online optimization problems where we
have external noisy predictions available. We propose a stochastic prediction
error model that generalizes prior models in the learning and stochastic
control communities, incorporates correlation among prediction errors, and
captures the fact that predictions improve as time passes. We prove that
achieving sublinear regret and constant competitive ratio for online algorithms
requires the use of an unbounded prediction window in adversarial settings, but
that under more realistic stochastic prediction error models it is possible to
use Averaging Fixed Horizon Control (AFHC) to simultaneously achieve sublinear
regret and constant competitive ratio in expectation using only a
constant-sized prediction window. Furthermore, we show that the performance of
AFHC is tightly concentrated around its mean.

…

Article

Previous scalable protocols for downloading large, popular files from a single server include batching and cyclic multicast. With batching, clients wait to begin receiving a requested file until the beginning of its next multicast transmission, which collectively serves all of the waiting clients that have accumulated up to that point. With cyclic multicast, the file data is cyclically transmitted on a multicast channel. Clients can begin listening to the channel at an arbitrary point in time, and continue listening until all of the file data has been received.This paper first develops lower bounds on the average and maximum client delay for completely downloading a file, as functions of the average server bandwidth used to serve requests for that file, for systems with homogeneous clients. The results show that neither cyclic multicast nor batching consistently yields performance close to optimal. New hybrid download protocols are proposed that achieve within 15% of the optimal maximum delay and 20% of the optimal average delay in homogeneous systems.For heterogeneous systems in which clients have widely varying achievable reception rates, an additional design question concerns the use of high rate transmissions, which can decrease delay for clients that can receive at such rates, in addition to low rate transmissions that can be received by all clients. A new scalable download protocol for such systems is proposed, and its performance is compared to that of alternative protocols as well as to new lower bounds on maximum client delay. The new protocol achieves within 25% of the optimal maximum client delay in all scenarios considered.

…

Article

With a vast number of items, web-pages, and news to choose from, online
services and the customers both benefit tremendously from personalized
recommender systems. Such systems however provide great opportunities for
targeted advertisements, by displaying ads alongside genuine recommendations.
We consider a biased recommendation system where such ads are displayed without
any tags (disguised as genuine recommendations), rendering them
indistinguishable to a single user. We ask whether it is possible for a small
subset of collaborating users to detect such a bias. We propose an algorithm
that can detect such a bias through statistical analysis on the collaborating
users' feedback. The algorithm requires only binary information indicating
whether a user was satisfied with each of the recommended item or not. This
makes the algorithm widely appealing to real world issues such as
identification of search engine bias and pharmaceutical lobbying. We prove that
the proposed algorithm detects the bias with high probability for a broad class
of recommendation systems when sufficient number of users provide feedback on
sufficient number of recommendations. We provide extensive simulations with
real data sets and practical recommender systems, which confirm the trade offs
in the theoretical guarantees.

…

Article

We consider streaming over a peer-to-peer network with homogeneous nodes in
which a single source broadcasts a data stream to all the users in the system.
Peers are allowed to enter or leave the system (adversarially) arbitrarily.
Previous approaches for streaming in this setting have either used randomized
distribution graphs or structured trees with randomized maintenance algorithms.
Randomized graphs handle peer churn well but have poor connectivity guarantees,
while structured trees have good connectivity but have proven hard to maintain
under peer churn. We improve upon both approaches by presenting a novel
distribution structure with a deterministic and distributed algorithm for
maintenance under peer churn; our result is inspired by a recent work proposing
deterministic algorithms for rumor spreading in graphs. A key innovation in our
approach is in having redundant links in the distribution structure. While this
leads to a reduction in the maximum streaming rate possible, we show that for
the amount of redundancy used, the delay guarantee of the proposed algorithm is
near optimal. We introduce a tolerance parameter that captures the worst-case
transient streaming rate received by the peers during churn events and
characterize the fundamental tradeoff between rate, delay and tolerance. A
natural generalization of the deterministic algorithm achieves this tradeoff
near optimally. Finally, the proposed deterministic algorithm is robust enough
to handle various generalizations: ability to deal with heterogeneous node
capacities of the peers and more complicated streaming patterns where multiple
source transmissions are present.

…

Article

With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models.

…

Article

As the use of wireless sensor networks increases, the need for
(energy-)efficient and reliable broadcasting algorithms grows. Ideally, a
broadcasting algorithm should have the ability to quickly disseminate data,
while keeping the number of transmissions low. In this paper we develop a model
describing the message count in large-scale wireless sensor networks. We focus
our attention on the popular Trickle algorithm, which has been proposed as a
suitable communication protocol for code maintenance and propagation in
wireless sensor networks. Besides providing a mathematical analysis of the
algorithm, we propose a generalized version of Trickle, with an additional
parameter defining the length of a listen-only period. This generalization
proves to be useful for optimizing the design and usage of the algorithm. For
single-cell networks we show how the message count increases with the size of
the network and how this depends on the Trickle parameters. Furthermore, we
derive distributions of inter-broadcasting times and investigate their
asymptotic behavior. Our results prove conjectures made in the literature
concerning the effect of a listen-only period. Additionally, we develop an
approximation for the expected number of transmissions in multi-cell networks.
All results are validated by simulations.

…

Article

Recent advances have resulted in queue-based algorithms for medium access control which operate in a distributed fashion, and yet achieve the optimal throughput performance of centralized scheduling algorithms. However, fundamental performance bounds reveal that the "cautious" activation rules involved in establishing throughput optimality tend to produce extremely large delays, typically growing exponentially in 1/(1-r), with r the load of the system, in contrast to the usual linear growth. Motivated by that issue, we explore to what extent more "aggressive" schemes can improve the delay performance. Our main finding is that aggressive activation rules induce a lingering effect, where individual nodes retain possession of a shared resource for excessive lengths of time even while a majority of other nodes idle. Using central limit theorem type arguments, we prove that the idleness induced by the lingering effect may cause the delays to grow with 1/(1-r) at a quadratic rate. To the best of our knowledge, these are the first mathematical results illuminating the lingering effect and quantifying the performance impact.
In addition extensive simulation experiments are conducted to illustrate and validate the various analytical results.

…

Article

Among the many techniques in computer graphics, ray tracing is prized because it can render realistic images, albeit at great computational expense. In this note, the performance of several approaches to ray tracing on a distributed memory parallel system is evaluated. A set of performance instrumentation tools and their associated visualization software are used to identify the underlying causes of performance differences.

…

Article

Wireless network topologies change over time and maintaining routes requires frequent updates. Updates are costly in terms of consuming throughput available for data transmission, which is precious in wireless networks. In this paper, we ask whether there exist low-overhead schemes that produce low-stretch routes. This is studied by using the underlying geometric properties of the connectivity graph in wireless networks. Comment: 29 pages, 19 figures, a shorter version was published in the proceedings of the 2008 ACM Sigmetrics conference

…

Article

The maximum independent set (MIS) problem is a well-studied combinatorial optimization problem that naturally arises in many applications, such as wireless communication, information theory and statistical mechanics.
MIS problem is NP-hard, thus many results in the literature focus on fast generation of maximal independent sets of high cardinality. One possibility is to combine Gibbs sampling with coupling from the past arguments to detect convergence to the stationary regime. This results in a sampling procedure with time complexity that depends on the mixing time of the Glauber dynamics Markov chain.
We propose an adaptive method for random event generation in the Glauber dynamics that considers only the events that are effective in the coupling from the past scheme, accelerating the convergence time of the Gibbs sampling algorithm.
The full paper is available on arXiv.

…

Article

Network service providers and customers are often concerned with aggregate
performance measures that span multiple network paths. Unfortunately, forming
such network-wide measures can be difficult, due to the issues of scale
involved. In particular, the number of paths grows too rapidly with the number
of endpoints to make exhaustive measurement practical. As a result, there is
interest in the feasibility of methods that dramatically reduce the number of
paths measured in such situations while maintaining acceptable accuracy.
In previous work we proposed a statistical framework to efficiently address
this problem, in the context of additive metrics such as delay and loss rate,
for which the per-path metric is a sum of (possibly transformed) per-link
measures. The key to our method lies in the observation and exploitation of
significant redundancy in network paths (sharing of common links).
In this paper we make three contributions: (1) we generalize the framework to
make it more immediately applicable to network measurements encountered in
practice; (2) we demonstrate that the observed path redundancy upon which our
method is based is robust to variation in key network conditions and
characteristics, including link failures; and (3) we show how the framework may
be applied to address three practical problems of interest to network providers
and customers, using data from an operating network. In particular, we show how
appropriate selection of small sets of path measurements can be used to
accurately estimate network-wide averages of path delays, to reliably detect
network anomalies, and to effectively make a choice between alternative
sub-networks, as a customer choosing between two providers or two ingress
points into a provider network.

…

Top-cited authors