Page 1

Designing ISP-friendly Peer-to-Peer Networks

Using Game-based Control

Vinith Reddy∗, Younghoon Kim†, Srinivas Shakkottai∗and A.L.Narasimha Reddy∗

∗Dept. of ECE, Texas A&M University

Email: {vinith reddy, sshakkot, reddy}@tamu.edu

†Dept. of CS, Korea Advanced Institute of Science and Technology

Email: kyhoon@gmail.com

Abstract—The rapid growth of peer-to-peer (P2P) networks in

the past few years has brought with it increases in transit cost

to Internet Service Providers (ISPs), as peers exchange large

amounts of traffic across ISP boundaries. This ISP oblivious

behavior has resulted in misalignment of incentives between P2P

networks—that seek to maximize user quality—and ISPs—that

would seek to minimize costs. Can we design a P2P overlay that

accounts for both ISP costs as well as quality of service, and

attains a desired tradeoff between the two? We design a system,

which we call MultiTrack, that consists of an overlay of multiple

mTrackers whose purpose is to align these goals. mTrackers split

demand from users among different ISP domains while trying to

minimize their individual costs (delay plus transit cost) in their

ISP domain. We design the signals in this overlay of mTrackers

in such a way that potentially competitive individual optimization

goals are aligned across the mTrackers. The mTrackers are also

capable of doing admission control in order to ensure that users

who are from different ISP domains have a fair chance of being

admitted into the system, while keeping costs in check. We prove

analytically that our system is stable and achieves maximum

utility with minimum cost. Our design decisions and control

algorithms are validated by Matlab and ns-2 simulations.

I. INTRODUCTION

The past few years have seen the rapid growth of con-

tent distribution over the Internet, particularly using peer-to-

peer (P2P) networks. Recent studies estimate that 35-90% of

bandwidth is consumed by P2P file-sharing applications, both

at the edges and even within the core [1], [2]. The use of

P2P networks for media delivery is expected to grow still

further, with the proliferation of legal applications (e.g. Pando

Networks [3]) that use P2P as a core technology.

While most P2P systems today possess some form of

network resource-awareness, and attempt to optimally utilize

the system resources, they are largely agnostic to Internet

Service Providers’ (ISP) concerns such as traffic management

and costs. This ISP-oblivious nature of P2P networks has

hampered the ability of system participants to correctly align

incentives. Indeed, the recent conflicts between ISPs and

content providers, as well as efforts by some ISPs such as

Comcast to limit P2P traffic on their networks [4], speak in part

to an inability to align interests correctly. Such conflicts are

particularly critical as P2P becomes an increasingly prevalent

form of content distribution [5].

A traditional BitTorrent system [6] has elements called

Trackers whose main purpose is to enable peers to find

Requests

P2P

ISP 1

mTracker 1

������������

Requests

Requests

ISP 2

ISP 3

mTracker 2

���������

mTracker 3

���������

P2P

P2P

Fig. 1.

individual optimizations, achieve an optimal delay-cost tradeoff.

The MultiTrack architecture. Multiple trackers, each following

each other. The BitTorrent Tracker randomly assigns a new

(entering) user a set of peers that are already in the system to

communicate with. This system has the disadvantage that if

peers who are assigned to help each other are in the domains of

different ISPs, they would cause significant transit costs to the

ISPs due to the inter-ISP traffic that they generate. However, if

costs are reduced by forcing traffic to be local, then the delay

performance of the system could suffer. Recent work such as

[7]–[9] has focused on cost in terms of load balancing and

localizing traffic, and developed heuristics to attain a certain

quality of service (QoS). For example, P4P [8] develops a

framework to achieve minimum cost (optimal load balancing)

among ISP links, but its BitTorrent implementation utilizes the

heuristic that 30% of peers declared to each requesting user

should be drawn from “far away ISPs” in order to attain a

good QoS.

This leads us to the fundamental question that we attempt

to answer in this paper: Can we develop a distributed delay

and cost optimal P2P architecture? We focus on developing a

provably optimal price-assisted architecture called MultiTrack,

that would be aware of the interaction between delay and cost.

The idea is to understand that while the resources available

with peers in different ISP domains should certainly be used,

such usage comes at a price. The system must be able to

determine the marginal gain in performance for a marginal

increase in cost. It would then be able to locate the optimal

point at which to operate.

arXiv:0912.3856v1 [cs.NI] 19 Dec 2009

Page 2

The conceptual system1is illustrated in Figure 1. The

system is managed by a set of mTrackers. Each mTracker is

associated with a particular ISP domain. The mTrackers are

similar to the Trackers in BitTorrent [6], in that their main

purpose is to enable peers to find each other. However, unlike

BitTorrent, the mTrackers in MultiTrack form an overlay net-

work among themselves. The purpose of the overlay network

is to provide multi-dimensional actions to the mTrackers. In

Figure 1, mTracker 1 is in steady state (wherein the demand

on the mTracker is less than the available capacity [11]),

which implies that it has spare capacity to serve requests from

other mTrackers. Consider mTracker 2 which is in transient

state (wherein the demand on the mTracker is more than

the available capacity [11]). When a request arrives, it can

either assign the requester to its own domain at essentially

zero cost, or can forward the user to mTracker 1 and incur a

cost for doing so. However, the delay incurred by forwarded

users would be less as mTracker 1 has higher capacity. Thus,

mTracker 2 can trade-off cost versus delay performance by

forwarding some part of its demand.

Each mTracker uses price assisted decision making by

utilizing dynamics that consider the marginal payoff of for-

warding traffic to that of retaining traffic in the same domain

as the mTracker. Several such rational dynamics have been de-

veloped in the field of game theory that studies the behavior of

selfish users. We present our system model with its simplifying

assumptions in Section III. We then design a system in which

the actions of these mTrackers, each seeking to maximize

their own payoffs, actually result in ensuring lowest cost of

the system as a whole. The scheme, presented in Section IV,

involves implicit learning of capacities through probing and

backoff through a rational control scheme known as replicator

dynamics [12], [13]. We present a game theoretic framework

for our system in Section IV-A and show using Lyapunov

techniques that the vector of split probabilities converges to

a provably optimal state wherein the total cost in terms of

delay and traffic-exchange is minimized. Further, this state is

a Wardrop equilibrium [14].

We then consider a subsidiary problem of achieving fair

division of resources among different mTrackers through ad-

mission control in Section V. The objective here is to ensure

that some level of fairness is maintained among the users in

different mTracker domains, while at the same time ensuring

that the costs in the system are not too high. Admission

control implies that not all users in all domains would be

allowed to enter the system, but it should be implemented in

a manner that is fair to users in different mTracker domains.

The mTracker takes admission control decisions based on

the marginal disutility caused by users to the system. Users

interested in the file would approach the mTracker that would

decide whether or not to admit the user into the system. We

show that our mTracker admission control optimally achieves

fairness amongst users, while maintaining low system cost.

Note that switching off admission control would still imply

1We presented some basic ideas on the system as a poster [10].

that the total system cost is minimized by mTrackers, but this

could be high if the offered load were high.

We simulate our system both using Matlab simulations in

Section VI to validate our analysis, as well as ns-2 simulations

in Section VII to show a plausible implementation of the

system as a whole. The simulations strongly support our

architectural decisions. We conclude with ideas on the future

in Section VIII.

II. RELATED WORK

There has been much recent work on P2P systems and traffic

management, and we provide a discussion of work that is

closely related to our problem. Fluid models of P2P systems,

and the multi-phase (transient/steady state) behavior has been

developed in [11], [15]. The results show how supply of a

file correlates with its demand, and it is essentially transient

delays that dominate.

Traffic management and load balancing have become im-

portant as P2P networks grow in size. There has been work

on traffic management for streaming traffic [16]–[18]. In par-

ticular, [16] focuses on server-assisted streaming, while [17],

[18] aim at fair resource allocation to peers using optimization-

decomposition.

Closest to our setting is work such as [7]–[9], that study the

need to localize traffic within ISP domains. In [7], the focus

is on allowing only local communications and optimizing the

performance by careful peer selection, while [8] develops an

optimization framework to balance load across ISPs using cost

information. A different approach is taken in [9], wherein peers

are selected based on inputs on nearness provided by CDNs (if

a CDN directs two peers to the same cache, they are probably

near by).

Pricing and market mechanisms for P2P systems are of

significant interest, and work such as [19] use ideas of currency

exchange between peers that can be used to facilitate file trans-

fers. The system we design uses prices between mTrackers that

map to real-world costs of traffic exchange, but do not have

currency exchanges between peers which still use BitTorrent

style bilateral barter.

III. THE MULTITRACK SYSTEM

MultiTrack is a hybrid P2P network architecture similar to

BitTorrent [6], [20] in many ways, and we first review some

control elements of BitTorrent. In the BitTorrent architecture

a file is divided into multiple chunks, and there exists at least

one Tracker for each file that keeps track of peers that contain

the file in its entirety (such peers are called seeds) or some

chunks of it (such peers are called downloaders). A new peer

that wants to download a file needs to first locate a Tracker

corresponding to the file. Information about Trackers for a file

(among other information) is contained in .torrent files, which

are hosted at free servers. Thus, the peer downloads the .torrent

file, and locates a Tracker using this file.

When a peer sends a request to a Tracker corresponding

to the file it wants, the Tracker returns the addresses of a set

of peers (seeds and downloaders) that the new peer should

Page 3

contact in order to download the file. The peer then connects

to a subset of the given peers and downloads chunks of the file

from them. While downloading the file, a peer sends updates

to the Tracker about its download status (number of chunks

uploaded and downloaded). Hence, a tracker knows about the

state of each peer that is present in its peer cloud (or swarm).

The MultiTrack architecture consists of BitTorrent-like

trackers, which we call mTrackers. We associate one or

more mTrackers to each ISP, with each mTracker controlling

access to its own peer cloud. Note that all these mTrackers

are identified with the same file. Unlike BitTorrent Trackers,

mTrackers are aware of each other and form an overlay

network among themselves. Each mTracker consists of two

different modules:

1) Admission control: Unlike the BitTorrent tracker which

has no control over admission decisions of peers, the

mTracker can decide whether or not to admit a particular

peer into the system. Once admitted, the peer is either

served locally or is forwarded to a different mTracker

based on the decision taken by the mTracker.

2) Traffic management: This module of the mTracker,

takes a decision on whether to forward a new peer into

its own peer cloud (at relatively low cost, but possibly

poor delay performance) or to another mTracker (at

higher cost, but potentially higher performance).

The rationale behind this architecture is as follows. At any

time, a peer cloud has a capacity associated with it, based

on the maximum upload bandwidth of a peer in the cloud

and the total number of chunks present at all the peers in the

cloud (seeds and downloaders). In general, a peer-cloud has

two phases of operation [11]—a transient phase where the

available capacity is less than the demand (in other words,

not enough peers with a copy of the file), and a steady state

phase, where the available capacity is greater than the capacity

required to satisfy demand. Thus, a peer cloud can be thought

of as a server with changing service capacity. We balance load

among different peer clouds located in different ISPs, taking

into account the transit cost associated with traffic exchange.

We assume time scale separation between the two modes—

traffic management and admission control, of the mTracker.

Our assumption is that the capacity of a P2P system remains

roughly constant over intervals of time, with capacity changes

seen at the end of these time periods. We divide system

dynamics into three time scales:

1) Large: The capacity of the peer cloud associated with

each mTracker changes at this time scale.

2) Medium: mTrackers take admission control decisions

at this time scale. They could increase or decrease the

number of admitted peers based on feedback from the

system. We will study dynamics at this time scale in

Section V.

3) Small: mTrackers split the demand that they see among

the different options (mTrackers visible to them) at this

time scale. Thus, they change the probability of sending

peers to their own peer-cloud or to other mTrackers at

this time scale. We study these dynamics in Section IV.

Note that a medium time unit comprises of many small time

units and a large time unit comprises of many medium time

units. The artifice of splitting dynamics into these time scales

allows us to design each control loop while assuming that

certain system parameters remain constant during the interval.

In the following sections, we present the design and analysis

of our different system components.

IV. MTRACKER: TRAFFIC MANAGEMENT

The objective of the mTracker’s Traffic Management module

is to split the demand that it sees among the different options

(other mTrackers, and its own peer cloud) that it has. Since

each mTracker is associated with a different ISP domain, it

would like to minimize the cost seen by that ISP, and yet

maintain a good delay performance for its users.

As mentioned in the last section, peer-clouds can be in

either transient or steady-state based on whether the demand

seen is greater than or less than the available capacity. An

mTracker in the transient mode would like to offload some

of its demand, while mTrackers in the steady-state mode can

accept load. Thus, each mTracker j in the transient mode

maintains a split probability vectorˆ? yj = [ˆ y1

Q is the total number of mTrackers, and some of the yi

be zero. We assume that the demand seen by mTracker j is a

Poisson process of rate xj. Thus, splitting traffic according

toˆ? yj would produce Q Poisson processes, each with rate

xi

Now, each mTracker in the steady-state mode can accept

traffic from mTrackers that are transient. It could, of course,

prioritize or reserve capacity for its own traffic; we assume

here that it does so, and the balance capacity available (in

users served per unit time) of this steady state mTracker is

Ci. Then the demand seen at each such mTracker i is the sum

of Poisson processes that arrive at it, whose rate is simply

?Q

use the M/M/1 delay function

1

Ci−?Q

Note that, peers from different transient mTrackers are not

allowed to communicate with each other at the steady state

mTracker to which they are forwarded. Thus, a peer that is

forwarded from one ISP domain to another is only allowed to

communicate with peers located in that ISP domain.

Now, the steady state mTrackers are disinterested players

in the system, and would like to minimize the total delay of

the system. They could charge an additional price that would

act as a congestion signal to mTrackers that forward traffic to

them. Such a congestion price should reflect the ill-effect that

increasing the load by one mTracker has on the others. What

should such a price look like? Now, consider the expression

j... ˆ yQ

j], where

jcould

j? yi

jxj(i = 1,...Q).

l=1xi

mTracker i is convex increasing in load, and for illustration

l. We assume that delay seen by each peer sent to

l=1xi

l

.

(1)

D(z) =

1

Ci− zi,

(2)

Page 4

which is the general form of the delay seen by each user at

mTracker i. The elasticity of delay with arrival rate zi

∂D(zi)

∂zi

The elasticity gives the fractional change in delay for a

fractional change in load, and can be thought of as the cost

of increasing load on the users. In other words, if the load is

increased by any one mTracker, all the others would also be

hurt by this quantity. Expressing the above in terms of delay

(multiplying by total delay) to ensure that all units are in delay,

the elasticity per unit rate per unit time at mTracker i is just

?Q

The above quantity represents the ill effect that increasing the

load per unit time has on the delay experienced on all users at

mTracker i. The delay cost (1) is the disutility for using the

mTracker, while the congestion cost (4) is the disutility caused

to others using the mTracker. The mTracker can charge this

price to each mTracker that forward peers to it.

Since mTrackers belong to different ISP domains, forward-

ing demand from one mTracker to the other is not free. We

assume that the transit cost per unit rate of forwarding demand

from mTracker j to mTracker i is pi

mTracker j due to forwarding traffic to mTracker i per unit

rate per unit time is given by the sum of transit cost pi

the delay cost (1) and congestion price (4), which yields a

total payoff per unit rate per unit time of

zi

D(zi)=

zi

Ci− zi.

(3)

l=1xi

l=1xi

l

(Ci−?Q

l)2.

(4)

j. Thus, the payoff of

jwith

1

Ci−?Q

l=1xi

l

+ pi

j+

?Q

l=1xi

l=1xi

l

(Ci−?Q

l)2.

(5)

The mTracker would like as small a payoff as possible.

In the next subsections we will develop a population game

model for our system, and show how rational dynamics when

coupled with the payoff function given above naturally results

in minimizing the total system cost (delay cost plus transit

cost). A good reference on population games is [21].

A. MultiTrack Population Game

A population game G, with a set Q = {1,...,Q} of non-

atomic populations of players is defined by the following

entities:

1) a mass, xj

∀j ∈ Q,

2) a strategy or action set, Sj= {1,...,Sj}

3) a payoff, Fi

j

∀j ∈ Q and ∀i ∈ Sj.

By a non-atomic population, we mean that the contribution of

each member of the population is infinitesimal.

In the MultiTrack Game each mTracker is a player and

the options available to each mTracker are other mTrackers’

peer cloud or its own peer cloud. Let ? x = [x1,...xQ] be

the total load vector of the system at the small time scale,

where xj

∀ j ∈ Q is the total arrival rate of new peer

requests (or mass) at mTracker j. A strategy distribution of

an mTracker j ∈ Q is a split of its load xj among different

∀j ∈ Q and

mTrackers including itself, represented as ? xj = [x1

where?Q

rate xi

all the mTrackers as X = [? x1...? xQ]. The vector X represents

the state of the system and it changes continuously with time.

Let the space of all possible states of a system for a given

load vector be denoted as X, i.e X ∈ X.

The payoff (per unit rate per unit time) of forwarding

requests from mTracker j to i, when the state of the system is

X is denoted by Fi

and differentiable. As developed above this payoff is

j...xQ

j],

i=1xi

j= xj. If a mTracker j is not connected to

mTracker i (or if it does not want to use mTracker i), then the

j= 0. We denote the vector of strategies being used by

j(X) ∈ R and is assumed to be continuous

Fi

j(X) =

1

Ci−?Q

l=1xi

l

+ pi

j+

?Q

l=1xi

l=1xi

l

(Ci−?Q

l)2.

(6)

Recall that mTrackers want to keep payoff as small as possi-

ble.

A commonly used concept in non-cooperative games in the

context of infinitesimal players, is the Wardrop equilibrium

[14]. Consider any strategy distribution ? xj = [x1

There would be some elements which are non-zero and others

which are zero. We call the strategies corresponding to the

non-zero elements as the strategies used by population j.

Definition 1 A stateˆX is a Wardrop equilibrium if for any

population j ∈ Q, all strategies being used by the members of

j yield the same marginal payoff to each member of j, whereas

the marginal payoff that would be obtained by members of j

is higher for all strategies not used by population j.

j,...,xSj

j].

In the context of our MultiTrack game the above definition of

Wardrop equilibrium is characterized by the following relation:

∀ r ∈ˆ Qjand i ∈ Q

Whereˆ Qj⊂ Q is the set of all mTrackers used by population

j in a strategy distributionˆ˜ xj.

The above concept refers to an equilibrium condition; the

question arises as to how the system actually arrives at such

a state. One model of population dynamics is Replicator

Dynamics [12]. The rate of increase of ˙ xi

i is a measure of its evolutionary success. Following ideas

of Darwinism, we may express this success as the difference

in fitness Fi

?Q

˙ xi

j

xi

j

Then the dynamics used to describe changes in the mass of

population j playing strategy s is given by

?1

The above expression implies that a population would increase

the mass of a successful strategy and decrease the mass of a

less successful one. It is called the replicator equation after

the tenet “like begets like”. Note that the total mass of the

Fr

j(ˆX) ≤ Fi

j(ˆX)

j/xi

jof the strategy

j(X) of the strategy i and the average fitness

j(X)/xjof the population j. Then we obtain

r=1xr

jFr

= average fitness - fitness of s.

˙ xi

j= xi

j

xj

Q

?

r=1

xr

jFr

j(X) − Fi

j(X)

?

.

(7)

Page 5

population j is constant. We design our mTracker Traffic

Management module around Replicator Dynamics (7).

B. Convergence of mTracker dynamics

We define the total cost in the system to be the sum of the

total system delay plus the total transit cost. In other words,

we have weighted delay costs and transit costs equally when

determining their contribution to the system cost. We could,

of course, use any convex combination of the two without any

changes to the system design. Hence using the M/M/1 delay

model at each tracker and adding transit costs, the total system

cost when the system is in state X is given as:

?

Note that the cost is convex and increasing in the load. We will

show that the above expression acts as a Lyapunov function

for the system.

Theorem 1: The system of mTrackers that follow replicator

dynamics with payoffs given by (6) is globally asymptotically

stable.

Proof: We prove the system stability using Lyapunov

Theory with C(X) defined in (8) as the Lyapunov function.

From (6) and (8),

∂xi

C(X) =

Q

?

i=1

?Q

r=1xi

r

Ci−?Q

l=1xi

l

+

Q

?

r=1

pi

rxi

r

?

.

(8)

∂C

j= Fi

j(X), hence

˙C(X) =

Q

?

i=1

Q

?

j=1

∂C

∂xi

j

˙ xi

j=

Q

?

i=1

Q

?

j=1

Fi

j(X)˙ xi

j,

(9)

Now, let˜ X be the set of states such that,

˙C(X) = 0,∀ X ∈˜ X

From (9) it is evident that˙C(X) = 0, if:

Fi

j(X) = 0 or

1

xj

r=1

(10)

?˙ xi

j= 0?⇒

Q

?

xr

jFr

j= Fi

j

∀ i,j ∈ Q

(11)

Thus,˜ X is the set of equilibrium states of replicator dynamics.

We will show that˙C(X) < 0 ∀ X / ∈˜ X.

From (9) and (7) we have

?

i=1

˙C(X) =

Q

?

?Q

i=1

Q

?

j=1

Fi

jxi

j

1

xj

Q

?

?Q

i=1

r=1

xr

jFr

j− Fi

j

?

(12)

=

Q

?

j=1

xj

?

xi

xjFi

j

j

?2

−

?

xi

xj(Fi

j

j)2

?

j

xj= 1, from

(13)

Since function f(x) = x2is convex and?Q

i=1

i=1

xi

Jensen’s inequality we have, ∀ X / ∈˜ X:

?Q

?

xi

xjFi

j

j

?2

−

?Q

i=1

?

xi

xj(Fi

j

j)2

?

<

0

∀ j ∈ Q

Thus, ˙C(X) < 0,

asymptotically stable [22].

∀ X / ∈ ˜ X and the system is globally

While replicator dynamics is a simple model, it has a draw-

back: during the different iterations of replicator dynamics,

if the value of xi

mTracker j to mTracker i becomes zero then it remains

zero forever. Thus, a strategy could become extinct when

replicator dynamics is used and its stationary point might not

be a Wardrop equilibrium. To avoid this problem, we can use

another kind of dynamics called Brown-von Neuman-Nash

(BNN) Dynamics, which is described as:

j, the rate of forwarding requests from

˙ xi

q

=

xqγi

q− xi

q

Q

?

j=1

γj

q

(14)

where, γi

q

=max?Fi

q(X) −

1

xq

Q

?

i=1

xi

qFi

q(X),0?(15)

denotes the excess payoff to strategy i relative to the average

payoff in population q. We can show that the system of

mTrackers that follow BNN dynamics is globally asymptot-

ically stable in a manner similar to the proof of Theorem 1.

We have just shown that the total system cost acts as a

Lyapunov function for the system. It should not come as

a surprise then, that the cost-minimizing state is a Wardrop

equilibrium. We prove this formally in the next subsection.

C. Cost efficiency of mTrackers

In previous work on selfish routing (e.g. [23]), it was

shown that the Wardrop equilibrium does not result in efficient

system performance. This inefficiency is referred to as the

price of anarchy, and it is primarily due to selfish user-

strategies. However, work on population games [21] suggests

that carefully devised price signals would indeed result in

efficient equilibria. We show now that the Wardrop equilibrium

attained by mTrackers is efficient for the system as a whole.

The objective of our system is to minimize the total cost for

a given load vector ? x = [x1,...,xQ]. Here the total cost in the

system is C(X) and is defined in (8). This can be represented

as the following constrained minimization problem:

min

X

subject to,

xi

C(X)

j= xj

j≥ 0

(16)

?Q

i=1xi

∀ j ∈ Q

(17)

(18)

The Lagrange dual associated with the above is

L(λ,X) = max

Q

?

where hi

the above dual problem gives the following Karush-Kuhn-

Tucker first order conditions:

∂L

∂xi

j

and

hi

λ,hmin

? Q

X

?

C(X)

−

(19)

j=1

λj

?

i=1

xi

j− xj

?

−

Q

?

i=1

Q

?

j=1

hi

jxi

j

?

j≥ 0 and λj, ∀ i,j,∈ Q are the dual variables. Now

(λ,X?) = 0

∀ i,j ∈ Q

(20)

jx?i

j= 0

∀ i,j ∈ Q

(21)