Publications (392)137.22 Total impact
 [Show abstract] [Hide abstract]
ABSTRACT: Random walkbased graph sampling methods have become increasingly popular and important for characterizing largescale complex networks. While powerful, they are known to exhibit problems when the graph is loosely connected, which slows down the convergence of a random walk and can result in poor estimation accuracy. In this work, we observe that many graphs under study, called target graphs, usually do not exist in isolation. In many situations, a target graph is often related to an auxiliary graph and an affiliation graph, and the target graph becomes better connected when viewed from these three graphs as a whole, or what we called a hybrid socialaffiliation network. This viewpoint brings extra benefits to the graph sampling framework, e.g., when directly sampling a target graph is difficult or inefficient, we can efficiently sample it with the assistance of auxiliary and affiliation graphs. We propose three sampling methods on such a hybrid socialaffiliation network to estimate target graph characteristics, and conduct extensive experiments on both synthetic and real datasets, to demonstrate the effectiveness of these new sampling methods. 

Article: Layered Percolation
[Show abstract] [Hide abstract]
ABSTRACT: We study the emergence of longrange connectivity in multilayer networks (also termed multiplex, composite and overlay networks) obtained by merging the connectivity subgraphs of multiple percolating instances of an underlying backbone network. Multilayer networks have applications ranging from studying longrange connectivity in a communication or social network formed with hybrid technologies, a transportation network connecting the same cities via rail, road and air, in studying outbreaks of flu epidemics involving multiple viral strains, studying temporal flow of information in dynamic networks, and potentially in studying conductivity properties of graphenelike stacked lattices. For a homogenous multilayer networkformed via merging $M$ random sitepercolating instances of the same graph $G$ with singlelayer siteoccupation probability $q$we argue that when $q$ exceeds a threshold $q_c(M) = \Theta(1/\sqrt{M})$, a spanning cluster appears in the multilayer network. Using a configuration model approach, we find $q_c(M)$ exactly for random graphs with arbitrary degree distributions, which have many applications in mathematical sociology. For multilayer percolation in a general graph $G$, we show that $q_c/\sqrt{M} < q_c(M) < \sqrt{\ln(1p_c)}/{\sqrt{M}}, \forall M \in {\mathbb Z}^+$, where $q_c$ and $p_c$ are the site and bond percolation thresholds of $G$, respectively. We show a close connection between multilayer percolation and mixed (sitebond) percolation, since both provide a smooth bridge between puresite and purebond percolations. We find excellent approximations and bounds on layered percolation thresholds for regular lattices using the aforesaid connection, and provide several exact results (via numerical simulations), and a specialized bound for the multilayer kagome lattice using a sitetobond transformation technique. 
Conference Paper: Sampling node pairs over large graphs
[Show abstract] [Hide abstract]
ABSTRACT: Characterizing user pair relationships is important for applications such as friend recommendation and interest targeting in online social networks (OSNs). Due to the large scale nature of such networks, it is infeasible to enumerate all user pairs and so sampling is used. In this paper, we show that it is a great challenge even for OSN service providers to characterize user pair relationships even when they possess the complete graph topology. The reason is that when sampling techniques (i.e., uniform vertex sampling (UVS) and random walk (RW)) are naively applied, they can introduce large biases, in particular, for estimating similarity distribution of user pairs with constraints such as existence of mutual neighbors, which is important for applications such as identifying network homophily. Estimating statistics of user pairs is more challenging in the absence of the complete topology information, since an unbiased sampling technique such as UVS is usually not allowed, and exploring the OSN graph topology is expensive. To address these challenges, we present asymptotically unbiased sampling methods to characterize user pair properties based on UVS and RW techniques respectively. We carry out an evaluation of our methods to show their accuracy and efficiency. Finally, we apply our methods to two Chinese OSNs, Doudan and Xiami, and discover significant homophily is present in these two networks. 
Conference Paper: Endhostbased shortest path routing in dynamic networks: An online learning approach
[Show abstract] [Hide abstract]
ABSTRACT: We consider the problem of endhostbased shortest path routing in a network with unknown, timevarying link qualities. Endhostbased routing is needed when internal nodes of the network do not have the scope or capability to provide globally optimal paths to given sourcedestination pairs, as can be the case in networks consisting of autonomous subnetworks or those with endhostbased routing restrictions. Assuming the source can probe links along selected paths, we formulate the problem as an online learning problem, where an existing solution achieves a performance loss (called regret) that is logarithmic in time with respect to (wrt) an offline algorithm that knows the link qualities. Current solutions assume coupled probing and routing; in contrast, we give a simple algorithm based on decoupled probing and routing, whose regret is only constant in time. We then extend our solution to support multipath probing and cooperative learning between multiple sources, where we show an inversely proportional decay in regret wrt the probing rate. We also show that without the decoupling, the regret grows at least logarithmically in time, thus establishing decoupling as critical for obtaining constant regret. Although our analysis assumes certain conditions (i.i.d.) on link qualities, our solution applies with straightforward amendments to much broader scenarios where these conditions are relaxed. The efficacy of the proposed solution is verified by tracedriven simulations. 
Conference Paper: A performance analysis study of multipath routing in a hybrid network with mobile users
[Show abstract] [Hide abstract]
ABSTRACT: Mobile communication platforms of individual agents and ground vehicles, ships and aircrafts of both civil and military services often operate in highly dynamic conditions with constantly changing infrastructure and access to communication resources. Efficient techniques for rapid and yet stable communication of such fleets with their control centres and between cooperating vehicles within the fleet is a challenging but important area of study with the potential to facilitate the analysis and design of efficient and robust communication systems. Multipath extensions of data transmission protocols aim to take advantage of path diversity to achieve efficient and robust bandwidth allocation while maintaining stability. Such multipath resource pooling extensions of routing and congestion control intrinsically implement decentralisation with implicit resource sharing. In this paper, we build on the recent theoretical work on fluid model approximations of multipath TCP and study their application to the scenarios in which a convoy with two communication nodes (representing the convoy's head and tail) establishes channels with a set of radio/WiFi towers and a satellite relaying information to a remote destination; these channels have timevarying capacities which depend on the position and dynamics of the convoy. The paper studies the performance of a multipath TCP controller and demonstrates how path diversity can be implicitly utilised to spread flows across available paths. Furthermore, we study the patterns of subflows governed by dynamic control according to the motion of the convoy and investigate the tradeoffs between resource utilisation and the speed of response by the subflows. 
Conference Paper: Provisioning multitier cloud applications using statistical bounds on sojourn time
[Show abstract] [Hide abstract]
ABSTRACT: In this paper we present a simple and effective approach for resource provisioning to achieve a percentile bound on the end to end response time of a multitier application. We, at first, model the multitier application as an open tandem network of M/G/1PS queues and develop a method that produces a near optimal application configuration, i.e, number of servers at each tier, to meet the percentile bound in a homogeneous server environment  using a single type of server. We then extend our solution to a Kserver case and our technique demonstrates a good accuracy, independent of the variability of servicetimes. Our approach demonstrates a provisioning error of no more than 3% compared to a 140% worst case provisioning error obtained by techniques based on an M/M/1FCFS queue model. In addition, we extend our approach to handle a heterogenous server environment, i.e., with multiple types of servers. We find that fewer highcapacity servers are preferable for high percentile provisioning. Finally, we extend our approach to account for the rental cost of each servertype and compute a cost efficient application configuration with savings of over 80%. We demonstrate the applicability of our approach in a real world system by employing it to provision the two tiers of the java implementation of TPCW  a multitier transactional web benchmark that represents an ecommerce web application, i.e. an online bookstore. 
Conference Paper: An energy transmission and distribution network using electric vehicles
[Show abstract] [Hide abstract]
ABSTRACT: Vehicletogrid provides a viable approach that feeds the battery energy stored in electric vehicles (EVs) back to the power grid. Meanwhile, since EVs are mobile, the energy in EVs can be easily transported from one place to another. Based on these two observations, we introduce a novel concept called EV energy network for energy transmission and distribution using EVs. We present a concrete example to illustrate the usage of an EV energy network, and then study the optimization problem of how to deploy energy routers in an EV energy network. We prove that the problem is NPhard and develop a greedy heuristic solution. Simulations using realworld data shows that our method is efficient. 
Conference Paper: Stochastic differential equations for power law behaviors
[Show abstract] [Hide abstract]
ABSTRACT: In this paper we present simple stochastic differential equations that lead to lowertail and/or upper tail power law behaviors. We also present a model with bidirectional Poisson counters that exhibits power law behavior near a critical point, which might be of interest to statistical physics. 
Conference Paper: Cooperative jamming to improve the connectivity of the 1D secrecy graph
[Show abstract] [Hide abstract]
ABSTRACT: Consider a onedimensional wireless network with n nodes uniformly and independently distributed at random in the interval. In addition, m eavesdropper nodes are uniformly and independently distributed in. For a randomly selected sourcedestination pair, we consider the problem of securely delivering a message from the source to the destination and we present achievable results on the number of eavesdropper nodes that can be tolerated by the network. Our constructions make use of cooperative jamming, in which nodes located close to the eavesdroppers generate artificial noise. For the onedimensional network case, our results provide an improvement to the connectivity properties of the recentlyintroduced secrecy graph which is disconnected for any positive number of eavesdroppers without cooperative jamming. We consider cases of both known and unknown eavesdropper locations. For known eavesdropper locations, we show that a message can be securely delivered from the source to the destination with probability one as the number of nodes n goes to infinity, for any number of independent eavesdroppers m(n) satisfying m(n) = o(√n / log n). For unknown eavesdropper locations, we present a construction which can tolerate m(n) = o(n/log n) under the assumption of independent eavesdroppers, but which is fragile in the face of collaborating eavesdroppers.  [Show abstract] [Hide abstract]
ABSTRACT: The capability of nodes to broadcast their message to the entire wireless network when nodes employ cooperation is considered. We employ an asymptotic analysis using an extended random network setting and show that the broadcast performance strongly depends on the path loss exponent of the medium. In particular, as the size of the random network grows, the probability of broadcast in a onedimensional network goes to zero for path loss exponents larger than one, and goes to a nonzero value for path loss exponents less than one. In twodimensional networks, the same behavior is observed for path loss exponents above and below two, respectively.  [Show abstract] [Hide abstract]
ABSTRACT: Traffic burstiness is known to be undesirable for a router as it increases the router’s queue length and hence the queueing delays of data flows. This poses a security problem in which an attacker intentionally introduces traffic burstiness into routers. We consider a correlation attack, whose fundamental characteristic is to correlate multiple attack flows to generate synchronized small attack bursts, in an attempt to aggregate the bursts into a large burst at a target router.In this paper, we develop an analytical, fluidbased framework that models how the correlation attack disrupts router queues and how it can be mitigated. Using Poisson Counter Stochastic Differential Equations (PCSDEs), our framework captures the dynamics of a router queue for special cases and gives the closedform average router queue length as a function of the interflow correlation. To mitigate the correlation attack, we apply our analytical framework to model different pacing schemes including Markov ON–OFF pacing and rate limiting, which are respectively designed to break down the interflow correlation and suppress the peak rates of bursts. We verify that our fluid models conform to packetlevel ns2 simulation results. 
Conference Paper: Characterizing continuoustime random walks on dynamic networks.
[Show abstract] [Hide abstract]
ABSTRACT: The workloads in modern Chipmultiprocessors (CMP) are becoming increasingly diversified, creating different resource demands on hardware substrate. It is necessary to allocate hardware resources based on the needs of the workloads in order to improve ...  [Show abstract] [Hide abstract]
ABSTRACT: In this paper, we investigate the benefits that accrue from the use of multiple paths by a session coupled with rate control over those paths. In particular, we study data transfers under two classes of multipath control, coordinated control where the rates over the paths are determined as a function of all paths, and uncoordinated control where the rates are determined independently over each path. We show that coordinated control exhibits desirable load balancing properties; for a homogeneous static random paths scenario, we show that the worstcase throughput performance of uncoordinated control behaves as if each user has but a single path (scaling like log(log(N) )/ log(N) where N is the system size, measured in number of resources), whereas coordinated control yields a worstcase throughput allocation bounded away from zero. We then allow users to change their set of paths and introduce the notion of a Nash equilibrium. We show that both coordinated and uncoordinated control lead to Nash equilibria corresponding to desirable welfare maximizing states, provided in the latter case, the rate controllers over each path do not exhibit any roundtrip time (RTT) bias (unlike TCP Reno). Finally, we show in the case of coordinated control that more paths are better, leading to greater welfare states and throughput capacity, and that simple path reselection polices that shift to paths with higher net benefit can achieve these states. 
Conference Paper: Robust multipath routing in large wireless networks.
[Show abstract] [Hide abstract]
ABSTRACT: One of the challenges of wireless networks is to pro vide a reliable endtoend path between two end hosts in the face of link and node outages. These can occur due to fluctuations in channel quality, node movement, or node failure. One mechanism that has been proposed is based on multipath routing , the idea being to establish two or more paths between the end hosts so that they always have a path between them with high probability in the face of outages. This naturally raises the question of how to discover these paths in an unknown, random wireless network to enable robust multipath routing. In order to answer this question, we model a random wireless network as a 2D spatial Poisson process. Based on the results of percolation highways in Franceschetti, et al. (1), we present accurate conditions that enable robust multipath routing. If the number of hops of a path between the end hosts is n, then there exists a path between them in a strip of width proportional to log n. More precisely, there exist C log n disjoint paths in a strip of width a(C,p ) · log n, where p is the probability that characterizes the availability of an individual wireless communication link. We derive tight bounds for the function a(C,p ). This provides a useful guideline for the establishment of multiple paths in a real wireless network, namely that the width should grow logarithmically in the number of hops on the path between the hosts. 
Conference Paper: A new virtual indexing method for measuring host connection degrees.
[Show abstract] [Hide abstract]
ABSTRACT: We present a new virtual indexing method for estimating host connection degrees for high speed links. It is based on the virtual connection degree sketch where a compact sketch of network traffic is built by generating the associated virtual bitmaps for each host. Each virtual bitmap consists of a fixed number of bits selected randomly from a shared bit array by a new method for recording the traffic flows of the corresponding host. The shared bit array is efficiently utilized by all hosts since its every bit is shared by the virtual bitmaps of multiple hosts. To reduce the “noise” contaminated in a host's virtual bitmaps due to sharing, we propose a new method to generate the “filtered” bitmap used to estimate host connection degree. Furthermore, it can be easily implemented in parallel and distributed processing environments. The experimental and testing results based on the actual network traffic show that the new method is accurate and efficient. 
Conference Paper: Keynote speaker
[Show abstract] [Hide abstract]
ABSTRACT: Network measurements are extremely important for the purpose of managing and configuring a network. They are also essential as part of controlled experiments for the purpose of designing new protocols and architectures. Consequently they are widely taken and used in current network operations and research. Unfortunately, most tools and most studies have been developed/conducted in a mostly ad hoc manner. Quite often, these tools and studies miss or make inefficient use of information contained within the measurements. Often this leads to poor quality, biased, and incorrect conclusions. Motivated by the above observations, we will argue in this talk for the need of a network measurement science that can deal in a principled way with the issues of measurement efficiency and measurement bias. To deal with measurement efficiency, we advocate the use of Fisher information during the design of measurement experiments and measurement tools. Briefly, Fisher information measures the amount of information that a single measurement provides to the computation of a statistic such as packet loss rate. We illustrate its application to the problem of estimating flow size distribution based on packet sampling, a widely used technique for performing network measurements. In the context of measurement bias, we shift our attention to measurements leading to the characterization of graphs as commonly found in the Internet and online social networks. We review several studies where biased measurements have led to flawed (but widely believed) conclusions and then describe how such biases can be easily avoided. 
Conference Paper: Multitarget tracking using proximity sensors
[Show abstract] [Hide abstract]
ABSTRACT: We consider the problem of tracking multiple moving targets in a continuous field using proximity sensors, which are binary sensors that can sense target presence by performing local energy detection subject to noise. Compared with more sophisticated sensors, proximity sensors have the advantage of having lower costs and lower energy consumption, but also the disadvantage of being less accurate. In this paper, we propose a hybrid tracking scheme where a coarsescale tracking is first performed by proximity sensors to narrow down the areas of interest, and then a finescale tracking is performed by highend sensors to estimate the exact target locations, with our focus on the former. In contrast to classic multitarget tracking which assumes 11 association between measurements and targets, we show that proximity measurements do not have such association and thus require a different objective. Formulating the coarsescale tracking as a problem of tracking the histograms of targets in a cellpartitioned field, we develop both an optimal and two approximate solutions via Bayesian Filtering (BF). In particular, one of our approximate solutions decouples the tracking of different targets and thus reduces the dimensionality of BF by relaxing the likelihood function, and the other further reduces the problem into discrete space by quantizing the target mobility model and the relaxed likelihood function. Together with the optimal solution, they provide flexible tradeoffs between accuracy and complexity. Simulations show that the proposed solutions can effectively track targets to the accuracy of a cell and thus reduce uncertainty for the finescale tracking.  [Show abstract] [Hide abstract]
ABSTRACT: We identify privacy risks associated with releasing network datasets and provide an algorithm that mitigates those risks. A network dataset is a graph representing entities connected by edges representing relations such as friendship, communication or shared activity. Maintaining privacy when publishing a network dataset is uniquely challenging because an individual’s network context can be used to identify them even if other identifying information is removed. In this paper, we introduce a parameterized model of structural knowledge available to the adversary and quantify the success of attacks on individuals in anonymized networks. We show that the risks of these attacks vary based on network structure and size and provide theoretical results that explain the anonymity risk in random networks. We then propose a novel approach to anonymizing network data that models aggregate network structure and allows analysis to be performed by sampling from the model. The approach guarantees anonymity for entities in the network while allowing accurate estimates of a variety of network measures with relatively little bias.  [Show abstract] [Hide abstract]
ABSTRACT: Peertopeer swarming is one of the de facto solutions for distributed content dissemination in today’s Internet. By leveraging resources provided by clients, swarming systems reduce the load on and costs to publishers. However, there is a limit to how much cost savings can be gained from swarming; for example, for unpopular content peers will always depend on the publisher in order to complete their downloads. In this paper, we investigate such a dependence of peers on a publisher. For this purpose, we propose a new metric, namely swarm selfsustainability. A swarm is referred to as selfsustaining if all its blocks are collectively held by peers; the selfsustainability of a swarm is the fraction of time in which the swarm is selfsustaining. We pose the following question: how does the selfsustainability of a swarm vary as a function of content popularity, the service capacity of the users, and the size of the file? We present a model to answer the posed question. We then propose efficient solution methods to compute selfsustainability. The accuracy of our estimates is validated against simulations. Finally, we also provide closedform expressions for the fraction of time that a given number of blocks is collectively held by peers.
Publication Stats
23k  Citations  
137.22  Total Impact Points  
Top Journals
Institutions

19702013

University of Massachusetts Amherst
 • School of Computer Science
 • Department of Electrical and Computer Engineering
Amherst Center, Massachusetts, United States


2010

Pacific Northwest National Laboratory
Richland, Washington, United States


20022010

Microbiology Department at UMass Amherst
Amherst Center, Massachusetts, United States 
Columbia University
 Department of Electrical Engineering
New York City, New York, United States 
Northrop Grumman
Falls Church, Virginia, United States


2008

University of Massachusetts Lowell
 Department of Computer Science
Lowell, Massachusetts, United States


2006

Università degli Studi di Palermo
Palermo, Sicily, Italy


2003

University of Florida
Gainesville, Florida, United States


19992001

AT&T Labs
 Research
Austin, Texas, United States


1994

National Chung Cheng University
 Department of Computer Science and Information Engineering
Chiaihsien, Taiwan, Taiwan


1992

Tilburg University
Tilburg, North Brabant, Netherlands


1990

The University of Western Ontario
London, Ontario, Canada
