Rajeev Motwani’s research while affiliated with Stanford University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (286)


The Sliding-Window Computation Model and Results
  • Chapter

July 2016

·

74 Reads

·

13 Citations

Mayur Datar

·

Rajeev Motwani

We present some results related to small space computation over sliding windows in the data-stream model. Most research in the data-stream model, including results presented in some of the other chapters, assume that all data elements seen so far in the stream are equally important and synopses, statistics or models that are built should reflect the entire data set. However, for many applications this assumption is not true, particularly those that ascribe more importance to recent data items. One way to discount old data items and only consider recent ones for analysis is the sliding-window model: Data elements arrive at every instant; each data element expires after exactly N time steps; and, the portion of data that is relevant to gathering statistics or answering queries is the set of last N elements to arrive. The sliding window refers to the window of active data elements at a given time instant and window size refers to N. This chapter presents a general technique, called the Exponential Histogram (EH) technique, that can be used to solve a wide variety of problems in the sliding-window model; typically problems that require us to maintain statistics. We will showcase this technique through solutions to basic counting problems, as well as other applications.


STREAM: The Stanford Data Stream Management System

July 2016

·

138 Reads

·

239 Citations

·

·

Shivnath Babu

·

[...]

·

Jennifer Widom

Traditional database management systems are best equipped to run one-time queries over finite stored data sets. However, many modern applications such as network monitoring, financial analysis, manufacturing, and sensor networks require long-running, or continuous, queries over continuous unbounded streams of data. In the STREAM project at Stanford, we are investigating data management and query processing for this class of applications. As part of the project we are building a general-purpose prototype Data Stream Management System (DSMS), also called STREAM, that supports a large class of declarative continuous queries over continuous streams and traditional stored data sets. The STREAM prototype targets environments where streams may be rapid, stream characteristics and query loads may vary over time, and system resources may be limited.


Figure 1. Distributed Architecture for a Secure Database Service
Figure 7. Brute Force vs Hill Climbing
Figure 8. Hill Climbing Iterations
Figure 9. Performance Gain using Hill Climbing
Figure 10. Real World Example-Hill Climbing Iterations

+2

Distributing data for secure database services
  • Conference Paper
  • Full-text available

March 2011

·

1,865 Reads

·

39 Citations

The advent of database services has resulted in privacy concerns on the part of the client storing data with third party database service providers. Previous approaches to enabling such a service have been based on data encryption, causing a large overhead in query processing. A distributed architecture for secure database services is proposed as a solution to this problem where data is stored at multiple servers. The distributed architecture provides both privacy as well as fault tolerance to the client. In this paper we provide algorithms for (1) distributing data: our results include hardness of approximation results and hence a heuristic greedy algorithm for the distribution problem (2) partitioning the query at the client to queries for the servers is done by a bottom up state based algorithm. Finally the results at the servers are integrated to obtain the answer at the client. We provide an experimental validation and performance study of our algorithms.

Download

Finding large cycles in Hamiltonian graphs

April 2010

·

37 Reads

·

18 Citations

Discrete Applied Mathematics

We show how to find in Hamiltonian graphs a cycle of length nΩ(1/loglogn)=exp(Ω(logn/loglogn)). This is a consequence of a more general result in which we show that if G has a maximum degree d and has a cycle with k vertices (or a 3-cyclable minor H with k vertices), then we can find in O(n3) time a cycle in G of length kΩ(1/logd). From this we infer that if G has a cycle of length k, then one can find in O(n3) time a cycle of length kΩ(1/(log(n/k)+loglogn)), which implies the result for Hamiltonian graphs. Our results improve, for some values of k and d, a recent result of Gabow (2004) [11] showing that if G has a cycle of length k, then one can find in polynomial time a cycle in G of length . We finally show that if G has fixed Euler genus g and has a cycle with k vertices (or a 3-cyclable minor H with k vertices), then we can find in polynomial time a cycle in G of length f(g)kΩ(1), running in time O(n2) for planar graphs.


A 1.43-Competitive Online Graph Edge Coloring Algorithm In The Random Order Arrival Model

January 2010

·

50 Reads

·

13 Citations

A classic theorem by Vizing proves that if the maximum degree of a graph is Δ, then it is possible to color its edges, in polynomial time, using at most Δ+1 colors. However, this algorithm is offline, i.e., it assumes the whole graph is known in advance. A natural question then is how well we can do in the online setting, where the edges of the graph are revealed one by one, and we need to color each edge as soon as it is added to the graph. Online edge coloring has an important application in fast switch scheduling. Here, a natural model is that edges arrive online, but in a random permutation. Even in the random permutations model, the best analysis for any algorithm is factor 2, which comes from the simple greedy algorithm (which is factor 2 even in the worst case online model). The algorithm of Aggarwal et al. (1) provides a 1+o(1) factor algorithm, but for the case of multigraphs, when Δ = ω(n2), where n is the number of vertices. In this paper, we show that for graphs with Δ = ω(log n), it is possible to color the graph with 1.43Δ + o(Δ) colors in the online random order model. Our algorithm is inspired by a 1.6 factor distributed offline algorithm of Panconesi and Srinivasan (9), which we extend by reusing colors online in multiple rounds.


Pricing Strategies for Viral Marketing on Social Networks

December 2009

·

343 Reads

·

111 Citations

Lecture Notes in Computer Science

We study the use of viral marketing strategies on social networks that seek to maximize revenue from the sale of a single product. We propose a model in which the decision of a buyer to buy the product is influenced by friends that own the product and the price at which the product is offered. The influence model we analyze is quite general, naturally extending both the Linear Threshold model and the Independent Cascade model, while also incorporating price information. We consider sales proceeding in a cascading manner through the network, i.e. a buyer is offered the product via recommendations from its neighbors who own the product. In this setting, the seller influences events by offering a cashback to recommenders and by setting prices (via coupons or discounts) for each buyer in the social network. This choice of prices for the buyers is termed as the seller’s strategy. Finding a seller strategy which maximizes the expected revenue in this setting turns out to be NP-hard. However, we propose a seller strategy that generates revenue guaranteed to be within a constant factor of the optimal strategy in a wide variety of models. The strategy is based on an influence-and-exploit idea, and it consists of finding the right trade-off at each time step between: generating revenue from the current user versus offering the product for free and using the influence generated from this sale later in the process.


On Finding Narrow Passages with Probabilistic Roadmap Planners

September 2009

·

223 Reads

·

194 Citations

... This paper provides foundations for understanding the effect of passages on the connectedness of probabilistic roadmaps. It also proposes a new random sampling scheme for finding such passages. An initial roadmap is built in a "dilated" free space allowing some penetration distance of the robot into the obstacles. This roadmap is then modified by resampling around the links that do not lie in the true free space. Experiments show that this strategy allows relatively small roadmaps to reliably capture the free space connectivity


On the graph turnpike problem

June 2009

·

141 Reads

·

2 Citations

Information Processing Letters

Results on graph turnpike problem without distinctness, including its NP-completeness, and an O(m+n log n) algorithm, is presented. The usual turnpike problem has all pairwise distances given, but does not specify which pair of vertices w e corresponds to. There are two other problems that can be viewed as special cases of the graph turnpike problem, including the bandwidth problem and the low-distortion graph embedding problem. The aim for the turnpike problem in the NP-complete is to orient the edges with weights w i in either direction so that when the whole cycle is transversed in the real line, it returns to a chosen starting point for the cycle. An instance of the turnpike problem with or without distinctness is uniquely mappable if there exists at most one solution up to translation and choice of orientation.


Anonymizing Unstructured Data

October 2008

·

364 Reads

·

22 Citations

In this paper we consider the problem of anonymizing datasets in which each individual is associated with a set of items that constitute private information about the individual. Il- lustrative datasets include market-basket datasets and search engine query logs. We formalize the notion of k-anonymity for set-valued data as a variant of the k-anonymity model for traditional relational datasets. We define an optimization problem that arises from this definition of anonymity and provide O(k log k) and O(1)-approximation algorithms for the same. We demonstrate applicability of our algorithms to the America Online query log dataset.


A Survey of Query Auditing Techniques for Data Privacy

June 2008

·

87 Reads

·

28 Citations

This chapter is a survey of query auditing techniques for detecting and preventing disclosures in a database containing private data. Informally, auditing is the process of examining past actions to check whether they were in conformance with official policies. In the context of database systems with specific data disclosure policies, auditing is the process of examining queries that were answered in the past to determine whether answers to these queries could have been used by an individual to ascertain confidential information forbidden by the disclosure policies. Techniques used for detecting disclosures could potentially also be used or extended to prevent disclosures, and so in addition to the retroactive auditing mentioned above, researchers have also studied an online variant of the auditing problem wherein the task of an online auditor is to deny queries that could potentially cause a breach of privacy.


Citations (89)


... Refer to the largest cost incurred at a processor as the makespan of the parallel processing. The load balancing problem is "equivalent to" makespan minimization [AMZ03], by setting the capacity of each processor as t(|Σ|,|G|) n , via PTIME reductions. The problem is intractable, but approximable. ...

Reference:

GRAPE: Parallel Graph Query Engine
The load rebalancing problem
  • Citing Conference Paper
  • January 2003

... Thus, they share a common history visualized in Figure 7. As depicted on the left, in the early 00s, fundamental concepts to handle unbounded data streams and continuous queries were introduced in seminal SPSs, e.g., STREAM [20], Aurora [10], Borealis [9], or TelegraphCQ [31]. These systems provide essential stream processing features but lack CEP requirements to specify the time and cause relationships between events [36,61], e.g., before 2016, most ASPSs did not provide windowing by event time [14], or dealt with out-of-order arrivals [26,60,69]. ...

STREAM: The Stanford Data Stream Management System
  • Citing Chapter
  • July 2016

... Theorem 3.1 in [4] ensures that the probability of such a detection making a Type I error does not exceed . Inspired by exponential histogram technology [4,7,8] which guarantees (1 + )-multiplicative approximation, we iterate in := { + 1 − 2 | ∈ Z ≥0 , + 1 − 2 > 0} and let = min ( − ). Two situations that may occur: ...

The Sliding-Window Computation Model and Results
  • Citing Chapter
  • July 2016

... To the best of our knowledge, this problem has not been addressed earlier. Existing solutions for tamper proofing audit trails [15], or privacy-preserving database access [9, 1], and authentic third-party data publication [4, 6, 11] are not applicable in this domain as discussed in the related work section. In this paper we propose scalable solutions for the privacy-preserving query result verification problem and develop a number of solutions that provide a tradeoff between the overhead for the owner, the efficiency of the verification, and the degree of exposure of the owner's database in order to prove the correctness of a query. ...

Enabling Privacy for the Paranoids
  • Citing Article
  • March 2004

... We denote this problem as k-HYPVC-PARTITE. This is an interesting problem in itself and its variants have been studied for applications related to databases such as distributed data mining [10], schema mapping discovery [11] and optimization of finite automata [17]. On bipartite graphs (k = 2), by Köenig's Theorem computing the minimum vertex cover is equivalent to computing the maximum matching which can be done efficiently. ...

Online Distributed Predicate Evaluation

... One of those areas were biased sampling functions, which do not sample uniformly the space, but bias samples towards certain areas. Several approaches were developed during this time, like sampling near or on the surface of obstacles (31), sampling inside narrow passages (32,33), Gaussian sampling around current frontier states and obstacles (34), sampling restricted to workspace geometries (35) and workspace decompositions (36,37), sampling on the medial axis of the environment (38), utility-based sampling to connect separate regions of roadmaps to each other (39), and sampling in areas that are deemed difficult (1). The dynamic-domain RRT (40,41) extended tree nodes based on their estimated exploration ability. ...

On Finding Narrow Passages with Probabilistic Roadmap Planners

... Our problem is a special case of a projective clustering problem in which all the subspaces are of the same dimension. It is also known as the Hyperplane Cover Problem [9], the m-Hyperplane Center Problem [15] and the Slab Width Problem [7] to some maximum tolerance, geometrically our problem is that of finding slabs of minimum width that cover all the points — thus the name bottleneck. In the case when the slabs are of zero width, the problem at hand is known as the k-line center problem in which lines are used instead of slabs. ...

Sublinear Projective Clustering with Outliers
  • Citing Article
  • January 2005