
Rina Panigrahy- Researcher at Google Inc.
Rina Panigrahy
- Researcher at Google Inc.
About
141
Publications
23,342
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
9,554
Citations
Introduction
Current institution
Publications
Publications (141)
Large language models' significant advances in capabilities are accompanied by significant increases in inference costs. Model routing is a simple technique for reducing inference cost, wherein one maintains a pool of candidate LLMs, and learns to route each prompt to the smallest feasible LLM. Existing works focus on learning a router for a fixed...
Standard decoding in a Transformer based language model is inherently sequential as we wait for a token's embedding to pass through all the layers in the network before starting the generation of the next token. In this work, we propose a new architecture StagFormer (Staggered Transformer), which staggered execution along the time axis and thereby...
Large language models (LLMs) have shown amazing performance on tasks that require planning and reasoning. Motivated by this, we investigate the internal mechanisms that underpin a network's ability to perform complex logical reasoning. We first construct a synthetic propositional logic problem that serves as a concrete test-bed for network training...
Causal language modeling using the Transformer architecture has yielded remarkable capabilities in Large Language Models (LLMs) over the last few years. However, the extent to which fundamental search and reasoning capabilities emerged within LLMs remains a topic of ongoing debate. In this work, we study if causal language modeling can learn a comp...
It is well established that increasing scale in deep transformer networks leads to improved quality and performance. This increase in scale often comes with an increase in compute cost and inference latency. Consequently, research into methods which help realize the benefits of increased scale without leading to an increase in the compute cost beco...
One way of introducing sparsity into deep networks is by attaching an external table of parameters that is sparsely looked up at different layers of the network. By storing the bulk of the parameters in the external table, one can increase the capacity of the model without necessarily increasing the inference time. Two crucial questions in this set...
How do we provably represent images succinctly so that their essential latent attributes are correctly captured by the representation to as high level of detail as possible? While today's deep networks (such as CNNs) produce image embeddings they do not have any provable properties and seem to work in mysterious non-interpretable ways. In this work...
Deep and wide neural networks successfully fit very complex functions today, but dense models are starting to be prohibitively expensive for inference. To mitigate this, one promising research direction is networks that activate a sparse subgraph of the network. The subgraph is chosen by a data-dependent routing function, enforcing a fixed mapping...
Deep and wide neural networks successfully fit very complex functions today, but dense models are starting to be prohibitively expensive for inference. To mitigate this, one promising direction is networks that activate a sparse subgraph of the network. The subgraph is chosen by a data-dependent routing function, enforcing a fixed mapping of inputs...
In this paper, we propose a dynamic cascaded encoder Automatic Speech Recognition (ASR) model, which unifies models for different deployment scenarios. Moreover, the model can significantly reduce model size and power consumption without loss of quality. Namely, with the dynamic cascaded encoder model, we explore three techniques to maximally boost...
Can deep learning solve multiple tasks simultaneously, even when they are unrelated and very different? We investigate how the representations of the underlying tasks affect the ability of a single neural network to learn them jointly. We present theoretical and empirical findings that a single neural network is capable of simultaneously learning m...
Deep learning has shown tremendous success on a variety of problems. However, unlike traditional computational paradigm, most neural networks do not have access to a memory, which might be hampering its ability to scale to large data structures such as graphs, lookup-tables, databases. We propose a neural architecture where sketch based memory is i...
Can deep learning solve multiple tasks simultaneously, even when they are unrelated and very different? We investigate how the representations of the underlying
tasks affect the ability of a single neural network to learn them jointly. We present
theoretical and empirical findings that a single neural network is capable of simultaneously learning m...
Large neural network models have been successful in learning functions of importance in many branches of science, including physics, chemistry and biology. Recent theoretical work has shown explicit learning bounds for wide networks and kernel methods on some simple classes of functions, but not on more complex functions which arise in practice. We...
In this paper we study the learnability of deep random networks from both theoretical and practical points of view. On the theoretical front, we show that the learnability of random deep networks with sign activation drops exponentially with its depth. On the practical front, we find that the learnability drops sharply with depth even with the stat...
The rapid growth of the Internet has led to the widespread use of newer and richer models of online shopping and delivery services. The race to efficient large scale on-demand delivery has transformed such services into complex networks of shoppers (typically working in the stores), stores, and consumers. The efficiency of processing orders in stor...
Complications in lower extremity in diabetes mellitus patients are common and have become an increasingly significant public health problem worldwide. Foot ulceration, sepsis and amputation are known and feared by almost every diabetic person. Factors that affect development and healing of diabetic patients’ foot ulcers include the degree of metabo...
Architecture that provides a data structure to facilitate personalized ranking over recommended content (e.g., documents). The data structure approximates the social distance of the searching user to the content at query time. A graph is created of content recommended by members of the social network, where the nodes of the graph include content no...
Multiple data prediction strategies are received. Each data prediction strategy may predict a next data value in a sequence of data values with a corresponding confidence value. Rather than rely on a single prediction strategy, the predictions of each of the data prediction strategies are linearly combined to generate a single prediction that is mo...
User accounts in a social networking application are divided into highly-connected accounts and regular accounts. A mapping of the highly-connected accounts to their friends, and a mapping of accounts to documents endorsed by the users associated with the accounts are stored on index servers of a search engine. When a query is received by a front-e...
To facilitate the estimation of relatedness between nodes of a graph, implementations estimate relatedness between nodes in a graph by pre-computing for a subset of sample nodes (e.g., center nodes) a plurality of transition probabilities between each sample node and each of the other nodes in the graph, and then later when queried the implementati...
Sketches are generated for each node in a graph. For undirected graphs, each sketch for a node may include an indicator of a node from a seed set of nodes and the shortest distance between the node and the indicated node. When a request is received for the shortest distance between two nodes of the graph, the sketches for each of the two nodes are...
We study the question of learning a sparse multivariate polynomial over the real domain. In particular, for some unknown polynomial f(x) of degree-d and k monomials, we show how to reconstruct f, within error ε, given only a set of examples xi drawn uniformly from the n-dimensional cube (or an n-dimensional Gaussian distribution), together with eva...
We study the effectiveness of learning low degree polynomials using neural networks by the gradient descent method. While neural networks have been shown to have great expressive power, and gradient descent has been widely used in practice for learning neural networks, few theoretical guarantees are known for such methods. In particular, it is well...
We investigate the problem of factorizing a matrix into several sparse
matrices and propose an algorithm for this under randomness and sparsity
assumptions. This problem can be viewed as a simplification of the deep
learning problem where finding a factorization corresponds to finding edges in
different layers and values of hidden units. We prove t...
With the explosive growth of social networks, many applications are increasingly harnessing the pulse of online crowds for a variety of tasks such as marketing, advertising, and opinion mining. An important example is the wisdom of crowd effect that has been well studied for such tasks when the crowd is non-interacting. However, these studies don't...
We consider the classical question of predicting binary sequences and study
the {\em optimal} algorithms for obtaining the best possible regret and payoff
functions for this problem. The question turns out to be also equivalent to the
problem of optimal trade-offs between the regrets of two experts in an "experts
problem", studied before by \cite{k...
Fractals are self-similar recursive structures that have been used in modeling several real world processes. In this work we study how "fractal-like" processes arise in a prediction game where an adversary is generating a sequence of bits and an algorithm is trying to predict them. We will see that under a certain formalization of the predictive pa...
Consider the classical problem of predicting the next bit in a sequence of
bits. A standard performance measure is {\em regret} (loss in payoff) with
respect to a set of experts. For example if we measure performance with respect
to two constant experts one that always predicts 0's and another that always
predicts 1's it is well known that one can...
In this talk we will look at the classical prediction game where the adversary (or nature) is producing a sequence of bits and a prediction algorithm is trying to predict the future bit(s) from the past bits. This is like gambling on the future bits which involves the risk of making mistakes while shooting for profit from right predictions. Say the...
We give new lower bounds for randomized NNS data structures in the cell probe model based on robust metric expansion for two metric spaces: l
∞ and Earth Mover Distance (EMD) in high dimensions. In particular, our results imply stronger non-embedability for these metric spaces into l
1. The main components of our approach are a strengthening of the...
There is a vast supply of prior art that study models for mental processes.
Some studies in psychology and philosophy approach it from an inner perspective
in terms of experiences and percepts. Others such as neurobiology or
connectionist-machines approach it externally by viewing the mind as complex
circuit of neurons where each neuron is a primit...
Motivated by trends in popularity of products, we present a formal model for studying trends in our choice of products in terms of three parameters: (1) their innate utility; (2) individual boredom associated with repeated usage of an item; and (3) social influences associated with the preferences from other people. Different from previous work, in...
Previous research has suggested that people who are in the same social circle exhibit similar behaviors and tastes. The rise of social networks gives us insights into the social circles of web users, and recommendation services (including search engines, advertisement engines, and collaborative filtering engines) provide a motivation to adapt recom...
In this paper we introduce a new notion of distance between nodes in a graph that we refer to as robust connectivity. Robust connectivity between a pair of nodes u and v is parameterized by a threshold k and intuitively captures the number of paths between u and v of length at most k. Using this new notion of distances, we show that any black box a...
Schizophyllum commune is widely distributed in the nature, but it rarely causes human infection. We have isolated this mould in a 46-year-old immunocompetent, non-diabetic patient with chronic sinusitis, previously treated with multiple antibiotics and topical steroid nasal drops with no response. Materials obtained from the nasal sinus during the...
This article focuses on computations on large graphs (e.g., the web-graph) where the edges of the graph are presented as a stream. The objective in the streaming model is to use small amount of memory (preferably sub-linear in the number of nodes n ) and a smaller number of passes.
In the streaming model, we show how to perform several graph comput...
We study the space and time complexity of approximating distributions of l-step random walks in simple (possibly directed) graphs G. While very efficient algorithms for obtaining additive
ε-approximations have been developed in the literature, no non-trivial results with multiplicative guarantees are known, and
obtaining such approximations is the...
This paper examines two fundamental issues pertaining to virtual machines (VM) consolidation. Current virtualization management tools, both commercial and academic, enable multiple virtual machines to be consolidated into few servers so that other servers can be turned off, saving power. These tools determine effective strategies for VM placement w...
Software failures due to configuration errors are commonplace as computer systems continue to grow larger and more complex. Troubleshooting these configuration errors is a major administration cost, especially in server clusters where problems often go undetected without user interference. This paper presents CODE–a tool that automatically detects...
Can (scientific) knowledge be reliably preserved over the long term? We have
today very efficient and reliable methods to encode, store and retrieve data in
a storage medium that is fault tolerant against many types of failures. But
does this guarantee -- or does it even seem likely -- that all knowledge can be
preserved over thousands of years and...
The paper explores known results related to the problem of identifying if a
given program terminates on all inputs -- this is a simple generalization of
the halting problem. We will see how this problem is related and the notion of
proof verifiers. We also see how verifying if a program is terminating involves
reasoning through a tower of axiomatic...
We present a formal model for studying fashion trends, in terms of three parameters of fashionable items: (1) their innate utility; (2) individual boredom associated with repeated usage of an item; and (3) social influences associated with the preferences from other people. While there are several works that emphasize the effect of social influence...
Consider a sequence of bits where we are trying to predict the next bit from
the previous bits. Assume we are allowed to say 'predict 0' or 'predict 1', and
our payoff is +1 if the prediction is correct and -1 otherwise. We will say
that at each point in time the loss of an algorithm is the number of wrong
predictions minus the number of right pred...
We study the fundamental problem of computing distances between nodes in large graphs such as the web graph and social networks. Our objective is to be able to answer dis- tance queries between pairs of nodes in real time. Since the standard shortest path algorithms are expensive, our approach moves the time-consuming shortest-path compu- tation of...
Publishing data for analysis from a table containing personal records, while maintaining individual privacy, is a problem of increasing importance today. The traditional approach of deidentifying records is to remove identifying fields such as social security number, name, etc. However, recent research has shown that a large fraction of the U.S. po...
In this paper we show how the complexity of performing nearest neighbor (NNS) search on a metric space is related to the expansion of the metric space. Given a metric space we look at the graph obtained by connecting every pair of points within a certain distance $r$ . We then look at various notions of expansion in this graph relating them to the...
Click through rates (CTR) offer useful user feedback that can be used to infer the relevance of search results for queries. However it is not very meaningful to look at the raw click through rate of a search result because the likelihood of a result being clicked depends not only on its relevance but also the position in which it is displayed. One...
We study the problem of designing a mechanism to rank items in forums by making use of the user reviews such as thumb and star ratings. We compare mechanisms where fo- rum users rate individual posts and also mechanisms where the user is asked to perform a pairwise comparison and state which one is better. The main metric used to evaluate a mechani...
Consider a sequence of bits where we are trying to predict the next bit from the previous bits. Assume we are allowed to say `predict 0' or `predict 1', and our payoff is $+1$ if the prediction is correct and $-1$ otherwise. We will say that at each point in time the loss of an algorithm is the number of wrong predictions minus the number of right...
Given n potential oil locations, where each has oil at a certain depth, we seek good trade-offs between the number of oil sources found and the total amount of drilling performed. The cost of exploring a location is proportional to the depth to which it is drilled. The algorithm has no clue about the depths of the oil sources at the different locat...
The study of hashing is closely related to the analysis of balls and bins; items are hashed to memory locations much as balls are thrown into bins. In particular, Azar et. al. [2] considered putting each ball in the less-full of two random bins. This lowers the probability that a bin exceeds a certain load from exponentially small to doubly exponen...
Finding sparse cuts is an important tool for analyzing large graphs that arise in practice, such as the web graph, online social communities, and VLSI circuits. When dealing with such graphs having billions of nodes, it is often hard to visualize global partitions. While studies on sparse cuts have traditionally looked at cuts with respect to all t...
Network Intrusion Detection Systems (NIDS) monitor network traffic to detect attacks or unauthorized activities. Traditional NIDSes search for patterns that match typical network compromise or remote hacking attempts. However, newer networking applications require finding the frequently repeated strings in a packet stream for further investigation...
In this paper, we attempt to improve the effectiveness and the efficiency of query-dependent link-based ranking algo- rithms such as HITS, MAX and SALSA. All these ranking algorithms view the results of a query as nodes in the web graph, expand the result set to include neighboring nodes, and compute scores on the induced neighborhood graph. In pre...
As VLSI silicon technology continues its relentless advance and memory densities increase, the problem of soft errors--bit upsets caused by alpha particles or neutron hits--demands solutions. Error-correcting codes (ECCs) are routinely used on random-access memories (RAMs) to increase soft error tolerance--codewords (CWs) (ECC bits concatenated to...
The Nearest Codeword Problem (NCP) is a basic algorithmic question in the theory of error-correcting codes. Given a point \(v \in \mathbb{F}_2^n\) and a linear space \(L\subseteq \mathbb{F}_2^n\) of dimension k NCP asks to find a point l ∈ L that minimizes the (Hamming) distance from v. It is well-known that the nearest codeword problem is NP-hard....
This work investigates a geometric approach to proving cell probe lower bounds for data structure problems. We consider the approximate nearest neighbor search problem on the Boolean hypercube ({0, 1} di, || · || 1) with d = Θ(log n). We show that any (randomized) data structure for the problem, that answers c-approximate nearest neighbor search qu...
In this paper, we focus on characterizing spamming botnets by leveraging both spam payload and spam server traffic properties. Towards this goal, we developed a spam signature generation framework called AutoRE to detect botnet-based spam emails and botnet membership. AutoRE does not require pre-classified training data or white lists. Moreover, it...
Estimating frequency moments of data streams is a very well studied problem and tight bounds are known on the amount of space that is necessary and sufficient when the stream is adversarially ordered. Recently, motivated by various practical considerations and applications in learning and statistics, there has been growing interest into studying st...
In this study we propose sketching algorithms for com- puting similarities between hierarchical data. Specifi- cally, we look at data objects that are represented us- ing leaf-labeled trees denoting a set of elements at the leaves organized in a hierarchy. Such representations are richer alternatives to a set. For example, a docu- ment can be repre...
We present a complete audit cycle of the management of third/fourth degree perineal tears in the three Glasgow maternity hospitals measured against the recommendations of the Royal College of Obstetricians and Gynaecologists (RCOG) Guideline No. 29 (www.rcog.org.uk). Following an initial 6-month data collection period, shortcomings in the practice...
We suggest a simple modification to the Kd-tree search algorithm for nearest neighbor search resulting in an improved performance.
The Kd-tree data structure seems to work well in finding nearest neighbors in low dimensions but its performance degrades
even if the number of dimensions increases to more than two. Since the exact nearest neighbor sea...
We provide several new results for the trace reconstructionproblem. In this setting, a binary string yields a collection of traces, where each trace is independently obtained by independently deleting each bit with a fixed probability δ. Each trace therefore consists of a random subsequence of the original sequence. Given the traces, we wish to rec...
Solid-state disks (SSDs) have the potential to revolution- ize the storage system landscape. However, there is little published work about their internal organization or the design choices that SSD manufacturers face in pursuit of optimal performance. This paper presents a taxonomy of such design choices and analyzes the likely performance of vario...
In this paper, we focus on characterizing spamming botnets by leveraging both spam payload and spam server traffic properties. Towards this goal, we developed a spam signature generation framework called AutoRE to detect botnet-based spam emails and botnet membership. AutoRE does not require pre-classified training data or white lists. Moreover, it...
New networking applications such as Network Intrusion Detection Systems (NIDS) require finding the frequently repeated strings in a packet stream for further investigation. The strategy of finding frequently repeated strings within a given time frame of the packet stream has been quite efficient to detect the polymorphic worms. A novel real-time wo...
We study the classic problem of estimating the sum of n variables. The traditional uniform sampling approach requires a linear number of samples to provide any non-trivial guarantees on the esti- mated sum. In this paper we consider various sampling methods besides uniform sampling, in particular sampling a variable with probability pro- portional...
Query optimization that involves expensive predicates has received considerable attention in the database community. Typically, the output to a database query is a set of tuples that satisfy certain conditions, and, with expensive predicates, these conditions may be computationally costly to verify. In the simplest case, when the query looks for th...
A partitioned TCAM-based search engine is presented that increases the packet forwarding rate multiple times over traditional TCAMs. The model works for IPv4 and IPv6 packet forwarding. Unlike the previous art, the improvement is achieved regardless of the incoming traffic pattern. Employing small and private memories that dynamically store popular...
We consider the problem of estimating the length of the shortest path from a vertex s to a vertex t in a DAG whose edge lengths are known only approximately but can be determined exactly at a cost. Initially, for each edge e, the length of e is known only to lie within an interval [le,he]; the estimation algorithm can pay we to find the exact lengt...
We suggest a simple modification to the kd-tree search algorithm for nearest neighbor search re-sulting in an improved performance. The Kd-tree data structure seems to work well in finding nearest neighbors in low dimensions but its performance degrades even if the number of dimensions increases to more than three. Since the exact nearest neighbor...
A partitioned TCAM-based search engine is presented that increases the packet forwarding rate multiple times over traditional TCAMs. The model works for IPv4 and IPv6 packet forwarding. Unlike the previous art, the improvement is achieved regardless of the incoming traffic pattern. Employing small and private memories that dynamically store popular...
We present an algorithm for finding frequent elements in a stream where the arrivals are not bursty. Depending on the amount of burstiness in the stream our algorithm detects elements with frequency at least t with space between \(\tilde O( F_1 / t^2)\) and \(\tilde O( F_2 / t^2)\) where F
1 and F
2 are the first and the second frequency moments of...
We consider the problem of finding the most frequent elements in the data stream model; this problem has a linear lower bound
in terms of the input length. In this paper we obtain sharper space lower bounds for this problem, not in terms of the length
of the input as is traditionally done, but in terms of the quantitative properties (in this case,...
Many networking applications require fast state lookups in a concurrent state machine, which tracks the state of a large number of flows simultaneously. We consider the question of how to compactly represent such concurrent state machines. To achieve compactness, we consider data structures for Ap- proximate Concurrent State Machines (ACSMs) that c...
A counting Bloom filter (CBF) generalizes a Bloom filter data structure so as to allow membership queries on a set that can be changing dynamically via insertions and deletions. As with a Bloom filter, a CBF obtains space savings by allowing false positives. We provide a simple hashing-based alternative based on d-left hashing called a d-left CBF (...
Publishing data for analysis from a table containing personal records, while maintaining individual privacy, is a problem of increasing importance today. The traditional approach of de-identifying records is to remove identifying fields such as social security number, name etc. However, recent research has shown that a large fraction of the US popu...
We consider the problem of estimating the size of a collec- tion of documents using only a standard query interface. Our main idea is to construct an unbiased and low-variance estimator that can closely approximate the size of any set of documents defined by certain conditions, including that each document in the set must match at least one query f...
We analyze protocols for disseminating a collection of data blocks over a network of peers with a view towards Bit- Torrent and related peer-to-peer networks. Unlike previous work, we accurately model the distribution of the individ- ual data blocks, a process which is critical to the parallelism that makes BitTorrent successful in practice. We als...
In this paper we relate the problem of finding structures related to perfect matchings in bipartite graphs to a stochastic process similar to throwing balls into bins. Given a bipartite graph with n nodes on each side, we view each node on the left as having balls that it can throw into nodes on the right (bins) to which it is adjacent. If each nod...
In this paper we propose a dictionary data structure for string search with errors where the query string may didiffer from the expected matching string by a few edits. This data structure can also be used to find the database string with the longest common prefix with few errors. Specifically, with a database of n random strings, each of length of...
Topic or feature extraction is often used as an important step in document classification and text mining. Topics are succinct representation of content in a document collection and hence are very effective when used as content identifiers in peer-to-peer systems and other large scale distributed content management systems. Effective topic extracti...
In recent work, the authors introduced a data structure with the same functionality as a counting Bloom filter (CBF) based on fingerprints and the d-left hashing technique. This paper describes dynamic bit reassignment, an approach that allows the size of the fingerprint to flexibly change with the load in each hash bucket, thereby reducing the pro...
Given a metric space $(X,d_X)$, $c\ge 1$, $r>0$, and $p,q\in [0,1]$, a distribution over mappings $\h:X\to \mathbb N$ is called a $(r,cr,p,q)$-sensitive hash family if any two points in $X$ at distance at most $r$ are mapped by $\h$ to the same value with probability at least $p$, and any two points at distance greater than $cr$ are mapped by $\h$...
In this paper we study the problem of finding the approximate nearest neighbor of a query point in the high dimensional space, focusing on the Euclidean space. The earlier approaches use locality-preserving hash functions (that tend to map nearby points to the same value) to construct several hash tables to ensure that the query point hashes to the...
A new approach for using block-selection scheme to increase the search throughput of multi-block TCAM-based network search engines is proposed. While the existing methods try to counter and forcibly balance the inherent bias of the Internet traffic, our method takes advantage of it. Our method improves flexibility of table management and gains scal...
Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997. Includes bibliographical references (p. 69-70). byu Rina Panigrahy. M.S.
This paper addresses the smallest grammar problem: What is the smallest context-free grammar that generates exactly one given string σ? This is a natural question about a fundamental object connected to many fields such as data compression, Kolmogorov complexity, pattern identification, and addition chains. Due to the problem's inherent complexity,...