R. Rastogi

Yahoo! Labs, Sunnyvale, California, United States

Are you R. Rastogi?

Claim your profile

Publications (13)0.69 Total impact

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we study the problem of efficiently computing multiple aggregation queries over a data stream. In order to share computation, prior proposals have suggested instantiating certain intermediate aggregates which are then used to generate the final answers for input queries. In this work, we make a number of important contributions aimed at improving the execution and generation of query plans containing intermediate aggregates. These include: (1) a different hashing model, which has low eviction rates, and also allows us to accurately estimate the number of evictions, (2) a comprehensive query execution cost model based on these estimates, (3) an efficient greedy heuristic for constructing good low-cost query plans, (4) provably near-optimal and optimal algorithms for allocating the available memory to aggregates in the query plan when the input data distribution is Zipf-like and Uniform, respectively, and (5) a detailed performance study with real-life IP flow data sets, which show that our multiple aggregates computation techniques consistently outperform the best-known approach.
    Data Engineering (ICDE), 2011 IEEE 27th International Conference on; 05/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Long-distance multi-hop wireless networks have been used in recent years to provide connectivity to rural areas. The salient features of such networks include TDMA channel access, nodes with multiple radios, and point-to-point long-distance wireless links established using high-gain directional antennas mounted on high towers. It has been demonstrated previously that in such network architectures, nodes can transmit concurrently on multiple radios, as well as receive concurrently on multiple radios. However, concurrent transmission on one radio, and reception on another radio causes interference. Under this scheduling constraint, given a set of source-destination demand rates, we consider the problem of satisfying the maximum fraction of each demand (also called the maximum concurrent flow problem). We give a novel joint routing and scheduling scheme for this problem, based on linear programming and graph coloring. We analyze our algorithm theoretically and prove that at least 50% of a satisfiable set of demands is satisfied by our algorithm for most practical networks (with maximum node degree at most 5).
    INFOCOM, 2010 Proceedings IEEE; 04/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we develop a framework for achieving scalable and communication-efficient dissemination of content in pub/sub systems. To maximize communication sharing across subscriptions, our routing framework groups subscriptions based on similarity, and transmits content matching one or more subscriptions in a group over a single dissemination tree for the group. We develop a cost model that uses published content samples in conjunction with the knowledge of consumer subscriptions to estimate the communication cost of a set of routing trees for subscription groups. The problem of computing a communication-optimal set of routing trees is then formulated as an optimization problem that seeks to find trees with the minimum cost. It turns out that the problem of computing a minimum-cost tree for a subscription group is a new generalization of the well-known Steiner tree problem, and an interesting problem in its own right. We develop an approximation algorithm that uses low-stretch spanning trees to compute a tree whose communication cost is within a polylogarithmic factor of the optimum. We use this to compute trees for various subscription- grouping configurations generated using a greedy clustering strategy, and select the one with the lowest cost. Our experimental study demonstrates the effectiveness of our content-aware routing approach compared to traditional routing based on content oblivious spanning trees.
    INFOCOM 2009, IEEE; 05/2009
  • Source
    K.V.M. Naidu, D. Panigrahi, R. Rastogi
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose new "low-overhead" network monitoring techniques to detect violations of path-level QoS guarantees like end-to-end delay, loss, etc. Unlike existing path monitoring schemes, our approach does not calculate QoS parameters for all paths. Instead, it monitors QoS values for only a few paths, and exploits the fact that path anomalies are rare and anomalous states are well separated from normal operation, to rule out path QoS violations in most situations. We propose a heuristic to select a small subset of network paths to monitor while ensuring that no QoS violations are missed. Experiments with an ISP topology from the Rocketfuel data set show that our heuristic can deliver almost a 50% decrease in monitoring overhead compared to previous schemes.
    INFOCOM 2008. The 27th Conference on Computer Communications. IEEE; 05/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we present a new channel allocation scheme for IEEE 802.11 based mesh networks with point-to- point links, designed for rural areas. Our channel allocation scheme allows continuous full-duplex data transfer on every link in the network. Moreover, we do not require any synchronization across the links as the channel assignment prevents cross link interference. Our approach is simple. We consider any link in the network as made up of two directed edges. To each directed edge at a node, we assign a non-interfering IEEE 802.11 channel so that the set of channels assigned to the outgoing edges is disjoint from channels assigned to the incoming edges. Evaluation of this scheme in a testbed demonstrate throughput gains of between 50 - 100%, and significantly less end-to-end delays, over existing link scheduling/channel allocation protocols (such as 2P [11]) designed for point-to-point mesh networks. Formally speaking, this channel allocation scheme is equivalent to an edge-coloring problem, that we call the directed edge coloring (DEC) problem. We establish a relationship between this coloring problem and the classical vertex coloring problem, and thus, show that this problem is NP-hard. More precisely, we give an algorithm that, given k vertex coloring of a graph can directed edge color it using xi(k) colors, where xi(k) is the smallest integer n such that (lfloorn/2rfloor/n ) ges k.
    INFOCOM 2008. The 27th Conference on Computer Communications. IEEE; 05/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: IEEE 802.11 WiFi equipment based wireless mesh networks have recently been proposed as an inexpensive approach to connect far-flung rural areas. Such networks are built using high-gain directional antennas that can establish long-distance wireless point-to-point links. Some nodes in the network (called gateway nodes) are directly connected to the wired internet, and the remaining nodes connect to the gateway(s) using one or more hops. The dominant cost of constructing such a mesh network is the cost of constructing antenna towers at nodes. The cost of a tower depends on its height, which in turn depends on the length of its links and the physical obstructions along those links. We investigate the problem of selecting which links should be established such that all nodes are connected, while the cost of constructing the antenna towers required to establish the selected links is minimized. We show that this problem is NP-hard and that a better than O(log n) approximation cannot be expected, where n is the number of vertices in the graph. We then present the first algorithm in the literature, for this problem, with provable performance bounds. More precisely, we present a greedy algorithm that is an O(log n) approximation algorithm for this problem. Finally, through simulations, we compare our approximation algorithm with both the optimal solution, and a naive heuristic.
    INFOCOM 2008. The 27th Conference on Computer Communications. IEEE; 05/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Cisco's NetFlow collector (NFC) is a powerful example of a real-world product that supports multiple aggregate queries over a continuous stream of IP flow records. NFC enables a plethora of network management tasks like traffic demands estimation, application traffic profiling, etc. In this paper, we investigate two computation sharing techniques for enabling streaming applications such as NFC to scale to hundreds of queries. Our first technique instantiates certain intermediate aggregates which are then used to generate the final answers for input queries. Our second technique coalesces the filter conditions of similar queries and uses the coalesced filter to pre-filter stream data input to these queries. Using these techniques, we propose a heuristic to compute a good query plan and perform extensive simulations to show that our heuristic delivers a factor of over 3 performance improvement compared to a naive approach.
    Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on; 05/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Detecting constraint violations in large-scale distributed systems has recently attracted plenty of attention from the research community due to its varied applications (security, network monitoring, etc.). Communication efficiency of these systems is a critical concern and determines their practicality. In this paper, we introduce a new set of methods called non-zero slack schemes to implement distributed SUM queries efficiently. We show, both analytically and empirically, that these methods can lead to a considerable reduction in the amount of communication. We propose three adaptive non-zero slack schemes that adapt to changing data distributions; our best scheme is a lightweight reactive scheme that probabilistically adjusts local constraints based on the occurrence of certain events (using only a periodic probability estimation). We conduct an extensive experimental study using real-life and synthetic data sets, and show that our non-zero slack schemes incur significantly less communication overhead compared to the state of the art zero slack scheme (over a 60% savings).
    Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on; 05/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Modern communication networks are vulnerable to attackers who send unsolicited messages to innocent users, wasting network resources and user time. Some examples of such attacks are spam emails, annoying tele-marketing phone calls, viral marketing in social networks, etc. Existing techniques to identify these attacks are tailored to certain specific domains (like email spam filtering), but are not applicable to a majority of other networks. We provide a generic abstraction of such attacks, called the Random Link Attack (RLA), that can be used to describe a large class of attacks in communication networks. In an RLA, the malicious user creates a set of false identities and uses them to communicate with a large, random set of innocent users. We mine the social networking graph extracted from user interactions in the communication network to find RLAs. To the best of our knowledge, this is the first attempt to conceptualize the attack definition, applicable to a variety of communication networks. In this paper, we formally define RLA and show that the problem of finding an RLA is NP-complete. We also provide two efficient heuristics to mine subgraphs satisfying the RLA property; the first (GREEDY) is based on greedy set-expansion, and the second (TRWALK) on randomized graph traversal. Our experiments with a real-life data set demonstrate the effectiveness of these algorithms.
    Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on; 05/2008
  • Source
    S. Agrawal, K.V.M. Naidu, R. Rastogi
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we develop passive network tomography techniques for inferring link-level anomalies like excessive loss rates and delay from path-level measurements. Our approach involves placing a few passive monitoring devices on strategic links within the network, and then passively monitoring the performance of network paths that pass through those links. In order to keep the monitoring infrastructure and communication costs low, we focus on minimizing (1) the number of passive probe devices deployed, and (2) the set of monitored paths. For mesh topologies, we show that the above two minimization problems are NP-hard, and consequently, devise polynomial-time greedy algorithms that achieve a logarithmic approximation factor, which is the best possible for any algorithm. We also consider tree topologies typical of Enterprise networks, and show that while similar NP-hardness results hold, constant factor approximation algorithms are possible for such topologies.
    INFOCOM 2007. 26th IEEE International Conference on Computer Communications. IEEE; 06/2007
  • Source
    P. Dutta, S. Jaiswal, R. Rastogi
    [Show abstract] [Hide abstract]
    ABSTRACT: IEEE 802.11 Wi-Fi equipment based wireless mesh networks have recently been proposed as an inexpensive approach to connect far-flung rural areas. Such networks are built using high-gain directional antenna that can establish long-distance point-point links. In recent work, a new MAC protocol named 2P has been proposed that is suited for the interference pattern within such a network. However, the 2P protocol requires the underlying graph (for each 802.11 channel) to be bi-partite. Under the assumption that 2P is the MAC protocol used in the mesh network, we make the following contributions in this paper. Given K non-interfering 802.11 channels, we propose a simple cut-based algorithm to compute K bi-partite sub-graphs (on each of which the 2P protocol can be run separately). We establish the class of graphs that can thus be completely covered by K bipartite subgraphs. For the remaining set of graphs, we look into the "price" of routing all end-to-end demands over only the bipartite subgraphs. We analytically establish what fraction of the max flow of the original mesh-graph can be routed over the bipartite subgraphs. Finally we look into the problem of mismatch between the load on a link (as computed by max flow) and its effective capacity under a given channel allocation. We propose heuristics to cluster links with similar loads into the same bipartite graphs (channels) and through comprehensive numerical simulations show that our heuristics come very close to the best possible flow.
    INFOCOM 2007. 26th IEEE International Conference on Computer Communications. IEEE; 06/2007
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: VillageNet is a wireless mesh network that aims to provide low-cost broadband Internet access for rural regions. The cost of building the network is kept low by using off-the-shelf IEEE 802.11 equipment and optimizing the network topology to minimize cost. In this paper we describe the over-all operation of VillageNet and discuss two fundamental problems in building such a network. Nodes in VillageNet communicate using long-distance point-to-point wireless links that are established using high-gain directional antenna. VillageNet uses the 2P MAC protocol [?], that is suited for the interference pattern within such a network. However, the 2P protocol requires the underlying mesh graph (for each 802.11 channel) to be bi-partite. Thus, if K channels are available, then an important consideration is how to select K bi-partite subgraphs to activate, such that the demands of the nodes are best met. We formally pose this problem and present some initial results. Second, we observe that the dominant cost of constructing such a mesh network is the cost of constructing antenna towers at nodes. The cost of a tower depends on its height, which in turn depends on the length of its links, and the physical obstructions along those links. Thus to minimize cost, we pose the problem of deciding which links should be established, such that all villages are connected and the cost of constructing antenna towers to establish the selected links is minimized.
    Communication Systems Software and Middleware, 2007. COMSWARE 2007. 2nd International Conference on; 02/2007
  • Partha Dutta, Sharad Jaiswal, Rajeev Rastogi
    [Show abstract] [Hide abstract]
    ABSTRACT: VillageNet is a new wireless mesh networking technology that provides low-cost broadband Internet access for wide regions. It targets the rural market around the world, where large populations live but paying capacities are low. VillageNet offers a low-cost, high performance alternative to traditional wireline/cellular technologies that have prohibitively expensive deployment costs. VillageNet connects villages in a mesh using long-distance wireless links. The cost of building the network is kept low by using off-the-shelf Institute of Electrical and Electronics Engineers (IEEE) 802.11 equipment and optimizing the network topology to minimize cost. In this paper, we describe the overall operation and architecture of the VillageNet network. We also describe the various technical challenges surrounding channel allocation, link scheduling, and topology construction for these networks and present some initial results for these problems. Finally, we outline several interesting open issues for building rural wireless mesh networks. © 2007 Alcatel-Lucent.
    Bell Labs Technical Journal 01/2007; 12:119-131. · 0.69 Impact Factor

Publication Stats

120 Citations
0.69 Total Impact Points

Institutions

  • 2009
    • Yahoo! Labs
      Sunnyvale, California, United States
  • 2008
    • Massachusetts Institute of Technology
      Cambridge, Massachusetts, United States
    • Alcatel Lucent
      Lutetia Parisorum, Île-de-France, France
  • 2007
    • Stanford University
      Palo Alto, California, United States