Mikkel Thorup

Mikkel Thorup
University of Copenhagen · Department of Computer Science

About

270
Publications
25,314
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
13,954
Citations
Citations since 2016
47 Research Items
4311 Citations
20162017201820192020202120220200400600
20162017201820192020202120220200400600
20162017201820192020202120220200400600
20162017201820192020202120220200400600

Publications

Publications (270)
Article
Full-text available
We describe an algorithm for solving an important geometric problem arising in computer-aided manufacturing. When cutting away a region from a solid piece of material—such as steel, wood, ceramics, or plastic—using a rough tool in a milling machine, sharp convex corners of the region cannot be done properly, but have to be left for finer tools that...
Preprint
Sketching is an important tool for dealing with high-dimensional vectors that are sparse (or well-approximated by a sparse vector), especially useful in distributed, parallel, and streaming settings. It is known that sketches can be made differentially private by adding noise according to the sensitivity of the sketch, and this has been used in pri...
Preprint
Simple tabulation hashing dates back to Zobrist in 1970 and is defined as follows: Each key is viewed as $c$ characters from some alphabet $\Sigma$, we have $c$ fully random hash functions $h_0, \ldots, h_{c - 1} \colon \Sigma \to \{0, \ldots, 2^l - 1\}$, and a key $x = (x_0, \ldots, x_{c - 1})$ is hashed to $h(x) = h_0(x_0) \oplus \ldots \oplus h_...
Preprint
Full-text available
We present a dynamic algorithm for maintaining the connected and 2-edge-connected components in an undirected graph subject to edge deletions. The algorithm is Monte-Carlo randomized and processes any sequence of edge deletions in $O(m + n \operatorname{polylog} n)$ total time. Interspersed with the deletions, it can answer queries to whether any t...
Preprint
We consider the numerical taxonomy problem of fitting a positive distance function ${D:{S\choose 2}\rightarrow \mathbb R_{>0}}$ by a tree metric. We want a tree $T$ with positive edge weights and including $S$ among the vertices so that their distances in $T$ match those in $D$. A nice application is in evolutionary biology where the tree $T$ aims...
Preprint
Full-text available
In dynamic load balancing, we wish to distribute balls into bins in an environment where both balls and bins can be added and removed. We want to minimize the maximum load of any bin but we also want to minimize the number of balls and bins affected when adding or removing a ball or a bin. We want a hashing-style solution where we given the ID of a...
Preprint
Full-text available
We say that a random integer variable $X$ is \emph{monotone} if the modulus of the characteristic function of $X$ is decreasing on $[0,\pi]$. This is the case for many commonly encountered variables, e.g., Bernoulli, Poisson and geometric random variables. In this note, we provide estimates for the probability that the sum of independent monotone i...
Chapter
Locality-sensitive hashing (LSH), introduced by Indyk and Motwani in STOC ’98, has been an extremely influential framework for nearest neighbor search in high-dimensional data sets. While theoretical work has focused on the approximate nearest neighbor problem, in practice LSH data structures with suitably chosen parameters are used to solve the ex...
Preprint
The classic way of computing a $k$-universal hash function is to use a random degree-$(k-1)$ polynomial over a prime field $\mathbb Z_p$. For a fast computation of the polynomial, the prime $p$ is often chosen as a Mersenne prime $p=2^b-1$. In this paper, we show that there are other nice advantages to using Mersenne primes. Our view is that the ou...
Article
We say that a simple, closed curve γ in the plane has bounded convex curvature if for every point x on γ, there is an open unit disk Ux and εx>0 such that x∈∂Ux and Bεx(x)∩Ux⊂Int γ. We prove that the interior of every curve of bounded convex curvature contains an open unit disk.
Preprint
To get estimators that work within a certain error bound with high probability, a common strategy is to design one that works with constant probability, and then boost the probability using independent repetitions. Important examples of this approach are small space algorithms for estimating the number of distinct elements in a stream, or estimatin...
Article
Recently, Kawarabayashi and Thorup presented the first deterministic edge-connectivity recognition algorithm in near-linear time. A crucial step in their algorithm uses the existence of vertex subsets of a simple graph G on n vertices whose contractions leave a multigraph with Õ(n∕δ) vertices and Õ(n) edges that preserves all non-trivial min-cuts o...
Preprint
Each vertex of an arbitrary simple graph on $n$ vertices chooses $k$ random incident edges. What is the expected number of edges in the original graph that connect different connected components of the sampled subgraph? We prove that the answer is $O(n/k)$, when $k\ge c\log n$, for some large enough $c$. We conjecture that the same holds for smalle...
Preprint
We say that a simple, closed curve $\gamma$ in the plane has bounded convex curvature if for every point $x$ on $\gamma$, there is an open unit disk $U_x$ and $\varepsilon_x>0$ such that $x\in\partial U_x$ and $B_{\varepsilon_x}(x)\cap U_x\subset\text{Int}\;\gamma$. We prove that the interior of every curve of bounded convex curvature contains an o...
Preprint
We provide a simple new randomized contraction approach to the global minimum cut problem for simple undirected graphs. The contractions exploit 2-out edge sampling from each vertex rather than the standard uniform edge sampling. We demonstrate the power of our new approach by obtaining better algorithms for sequential, distributed, and parallel mo...
Article
We develop a new algorithm for the turnstile heavy hitters problem in general turnstile streams, the EXPANDERSKETCH, which finds the approximate top-k items in a universe of size n using the same asymptotic O(k log n) words of memory and O(log n) update time as the COUNTMIN and COUNTSKETCH, but requiring only O(k poly(log n)) time to answer queries...
Preprint
Full-text available
Consider collections $\mathcal{A}$ and $\mathcal{B}$ of red and blue sets, respectively. Bichromatic Closest Pair is the problem of finding a pair from $\mathcal{A}\times \mathcal{B}$ that has similarity higher than a given threshold according to some similarity measure. Our focus here is the classic Jaccard similarity $|\textbf{a}\cap \textbf{b}|/...
Preprint
Previous work on tabulation hashing of P\v{a}tra\c{s}cu and Thorup from STOC'11 on simple tabulation and from SODA'13 on twisted tabulation offered Chernoff-style concentration bounds on hash based sums, but under some quite severe restrictions on the expected values of these sums. More precisely, the basic idea in tabulation hashing is to view a k...
Article
We describe a way of assigning labels to the vertices of any undirected graph on up to n vertices, each composed of n/2 + O(1) bits, such that given the labels of two vertices, and no other information regarding the graph, it is possible to decide whether or not the vertices are adjacent in the graph. This is optimal, up to an additive constant, an...
Article
We present a deterministic algorithm that computes the edge-connectivity of a graph in near-linear time. This is for a simple undirected unweighted graph G with n vertices and m edges. This is the first o(mn) time deterministic algorithm for the problem. Our algorithm is easily extended to find a concrete minimum edge-cut. In fact, we can construct...
Preprint
Locality-sensitive hashing (LSH), introduced by Indyk and Motwani in STOC '98, has been an extremely influential framework for nearest neighbor search in high-dimensional data sets. While theoretical work has focused on the approximate nearest neighbor problems, in practice LSH data structures with suitably chosen parameters are used to solve the e...
Preprint
We consider the hashing of a set $X\subseteq U$ with $|X|=m$ using a simple tabulation hash function $h:U\to [n]=\{0,\dots,n-1\}$ and analyse the number of non-empty bins, that is, the size of $h(X)$. We show that the expected size of $h(X)$ matches that with fully random hashing to within low-order terms. We also provide concentration bounds. The...
Preprint
Recently, Kawarabayashi and Thorup presented the first deterministic edge-connectivity recognition algorithm in near-linear time. A crucial step in their algorithm uses the existence of vertex subsets of a simple graph $G$ on $n$ vertices whose contractions leave a multigraph with $\tilde{O}(n/\delta)$ vertices and $\tilde{O}(n)$ edges that preserv...
Conference Paper
When deciding where to place access points in a wireless network, it is useful to model the signal propagation loss between a proposed antenna location and the areas it may cover. The indoor dominant path (IDP) model, introduced by Wölfle et al., is shown in the literature to have good validation and generalization error, is faster to compute than...
Conference Paper
We consider very natural ”fence enclosure” problems studied by Capoyleas, Rote, and Woeginger and Arkin, Khuller, and Mitchell in the early 90s. Given a set S of n points in the plane, we aim at finding a set of closed curves such that (1) each point is enclosed by a curve and (2) the total length of the curves is minimized. We consider two main va...
Preprint
Full-text available
Suppose that we are to place $m$ balls into $n$ bins sequentially using the $d$-choice paradigm: For each ball we are given a choice of $d$ bins, according to $d$ hash functions $h_1,\dots,h_d$ and we place the ball in the least loaded of these bins breaking ties arbitrarily. Our interest is in the number of balls in the fullest bin after all $m$ b...
Article
We consider very natural "fence enclosure" problems studied by Capoyleas, Rote, and Woeginger and Arkin, Khuller, and Mitchell in the early 90s. Given a set $S$ of $n$ points in the plane, we aim at finding a set of closed curves such that (1) each point is enclosed by a curve and (2) the total length of the curves is minimized. We consider two mai...
Article
We present a deterministic incremental algorithm for exactly maintaining the size of a minimum cut with O(log³n log log²n) amortized time per edge insertion and O(1) query time. This result partially answers an open question posed by Thorup (2007). It also stays in sharp contrast to a polynomial conditional lower bound for the fully dynamic weighte...
Article
Hashing is a basic tool for dimensionality reduction employed in several aspects of machine learning. However, the perfomance analysis is often carried out under the abstract assumption that a truly random unit cost hash function is used, without concern for which concrete hash function is employed. The concrete hash function may work fine on suffi...
Article
Full-text available
We present a deterministic fully-dynamic data structure for maintaining information about the bridges in a graph. We support updates in $\tilde{O}((\log n)^2)$ amortized time, and can find a bridge in the component of any given vertex, or a bridge separating any two given vertices, in $O(\log n / \log \log n)$ worst case time. Our bounds match the...
Article
Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to yield the desired probabilistic guarantees are often too complicated to be practical. Here, we survey recent results on how simple hashing schemes based on tabulation provide unexpectedly strong guarantees. Simple tabulation hashing dates back to Zobris...
Article
We consider the Similarity Sketching problem: Given a universe $[u]= \{0,\ldots,u-1\}$ we want a random function $S$ mapping subsets $A\subseteq [u]$ into vectors $S(A)$ of size $t$, such that similarity is preserved. More precisely: Given sets $A,B\subseteq [u]$, define $X_i=[S(A)[i]= S(B)[i]]$ and $X=\sum_{i\in [t]}X_i$. We want to have $E[X]=t\c...
Article
Backwards analysis, first popularized by Seidel, is often the simplest most elegant way of analyzing a randomized algorithm. It applies to incremental algorithms where elements are added incrementally, following some random permutation, e.g., incremental Delauney triangulation of a pointset, where points are added one by one, and where we always ma...
Article
Full-text available
We present a deterministic incremental algorithm for \textit{exactly} maintaining the size of a minimum cut with $\widetilde{O}(1)$ amortized time per edge insertion and $O(1)$ query time. This result partially answers an open question posed by Thorup [Combinatorica 2007]. It also stays in sharp contrast to a polynomial conditional lower-bound for...
Article
In turnstile $\ell_p$ $\varepsilon$-heavy hitters, one maintains a high-dimensional $x\in\mathbb{R}^n$ subject to $\texttt{update}(i,\Delta)$ causing $x_i\leftarrow x_i + \Delta$, where $i\in[n]$, $\Delta\in\mathbb{R}$. Upon receiving a query, the goal is to report a small list $L\subset[n]$, $|L| = O(1/\varepsilon^p)$, containing every "heavy hitt...
Article
We describe an algorithm for solving an important geometric problem arising in computer-aided manufacturing. When machining a pocket in a solid piece of material such as steel using a rough tool in a milling machine, sharp convex corners of the pocket cannot be done properly, but have to be left for finer tools that are more expensive to use. We wa...
Article
These lecture notes show that linear probing takes expected constant time if the hash function is 5-independent. This result was first proved by Pagh et al. [STOC'07,SICOMP'09]. The simple proof here is essentially taken from [Patrascu and Thorup ICALP'10]. The lecture is a nice illustration of the use of higher moments in data structures, and coul...
Article
Full-text available
We present a deterministic dynamic connectivity data structure for undirected graphs with worst-case update time $O(\sqrt{n}/w^{1/4})$ and constant query time, where $w = \Omega(\log n)$ is the word size. This bound improves on the previous best deterministic worst-case algorithm of Frederickson (STOC, 1983) and Eppstein Galil, Italiano, and Nissen...
Article
We consider the following fundamental problems: (1) Constructing $k$-independent hash functions with a space-time tradeoff close to Siegel's lower bound. (2) Constructing representations of unbalanced expander graphs having small size and allowing fast computation of the neighbor function. It is not hard to show that these problems are intimately c...
Article
Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to yield the desired probabilistic guarantees are often too complicated to be practical. Here we survey recent results on how simple hashing schemes based on tabulation provide unexpectedly strong guarantees. {\em Simple tabulation hashing\/} dates back to...
Article
These notes describe the most efficient hash functions currently known for hashing integers and strings. These modern hash functions are often an order of magnitude faster than those presented in standard text books. They are also simpler to implement, and hence a clear win in practice, but their analysis is harder. Some of the most practical hash...
Article
In the CONGEST model, a communications network is an undirected graph whose n nodes are processors and whose m edges are the communications links between processors. At any given time step, a message of size O(log n) may be sent by each node to each of its neighbours. We show for the synchronous model: If all nodes start in the same round, and each...
Article
In this paper we propose a hash function for $k$-partitioning a set into bins so that we get good concentration bounds when combining statistics from different bins. To understand this point, suppose we have a fully random hash function applied to a set $X$ of red and blue balls. We want to estimate the fraction $f$ of red balls. The idea of MinHas...
Article
Full-text available
We show how to represent a planar digraph in linear space so that distance queries can be answered in constant time. The data structure can be constructed in linear time. This representation of reachability is thus optimal in both time and space, and has optimal construction time. The previous best solution used $O(n\log n)$ space for constant quer...
Article
Full-text available
We present a deterministic near-linear time algorithm that computes the edge-connectivity and finds a minimum cut for a simple undirected unweighted graph G with n vertices and m edges. This is the first o(mn) time deterministic algorithm for the problem. In near-linear time we can also construct the classic cactus representation of all minimum cut...
Article
A random sampling function Sample:U->{0,1} for a key universe U is a distinguisher with probability p if for any given assignment of values v(x) to the keys x in U, including at least one non-zero v(x)!=0, the sampled sum sum{ v(x) | x in U and Sample(x) } is non-zero with probability at least p. Here the key values may come from any commutative mo...
Article
We present a data structure representing a dynamic set S of w-bit integers on a w-bit word RAM. With |S|=n and w > log n and space O(n), we support the following standard operations in O(log n / log w) time: - insert(x) sets S = S + {x}. - delete(x) sets S = S - {x}. - predecessor(x) returns max{y in S | y< x}. - successor(x) returns min{y in S | y...
Article
The power of two choices is a classic paradigm used for assigning $m$ balls to $n$ bins. When placing a ball we pick two bins according to some hash functions $h_0$ and $h_1$, and place the ball in the least full bin. It was shown by Azar et al.~[STOC'94] that for $m = O(n)$ with perfectly random hash functions this scheme yields a maximum load of...
Article
A random hash function $h$ is $\varepsilon$-minwise if for any set $S$, $|S|=n$, and element $x\in S$, $\Pr[h(x)=\min h(S)]=(1\pm\varepsilon)/n$. Minwise hash functions with low bias $\varepsilon$ have widespread applications within similarity estimation. Hashing from a universe $[u]$, the twisted tabulation hashing of P\v{a}tra\c{s}cu and Thorup [...
Article
Full-text available
We describe a way of assigning labels to the vertices of any undirected graph on up to $n$ vertices, each composed of $n/2+O(1)$ bits, such that given the labels of two vertices, and no other information regarding the graph, it is possible to decide whether or not the vertices are adjacent in the graph. This is optimal, up to an additive constant,...
Conference Paper
A random hash function h is ε-minwise if for any set S, |S| = n, and element x ∈ S, \(\Pr[h(x)=\min h(S)]=(1\pm\varepsilon )/n\). Minwise hash functions with low bias ε have widespread applications within similarity estimation. Hashing from a universe [u], the twisted tabulation hashing of Pǎtraşcu and Thorup [SODA’13] makes c = O(1) lookups in tab...
Article
Recognizing 3-colorable graphs is one of the most famous NP-complete problems [Garey, Johnson, and Stockmeyer STOC'74]. The problem of coloring 3-colorable graphs in polynomial time with as few colors as possible has been intensively studied: O(n1/2) colors [Wigderson STOC'82], Õ(n2/5) colors [Blum STOC'89], Õ (n3/8) colors [Blum FOCS'90], O(n1/4)...
Article
Simple tabulation dates back to Zobrist in 1970. Keys are viewed as c characters from some alphabet A. We initialize c tables h_0, ..., h_{c-1} mapping characters to random hash values. A key x=(x_0, ..., x_{c-1}) is hashed to h_0[x_0] xor...xor h_{c-1}[x_{c-1}]. The scheme is extremely fast when the character hash tables h_i are in cache. Simple t...
Conference Paper
Bottom-k sketches are an alternative to k×minwise sketches when using hashing to estimate the similarity of documents represented by shingles (or set similarity in general) in large-scale machine learning. They are faster to compute and have nicer theoretical properties. In the case of k×minwise hashing, the bias introduced by not truly random hash...
Article
Throughout the last decade, extensive deployment of popular intra-domain routing protocols such as open shortest path first and intermediate system–intermediate system, has drawn an ever increasing attention to Internet traffic engineering. This paper reviews optimization techniques that have been deployed for managing intra-domain routing in netwo...
Article
We consider bottom-k sampling for a set X, picking a sample Sk(X) consisting of the k elements that are smallest according to a given hash function h. With this sample we can estimate the relative size f=|Y|/|X| of any subset Y as |Sk(X) intersect Y|/k. A standard application is the estimation of the Jaccard similarity f=|A intersect B|/|A union B|...
Article
We survey recent results on parallel repetition theorems for computationally-sound interactive proofs (a.k.a. interactive arguments).
Article
Experts suggest that some pure result-based funding need to be initiated to fund successful research projects. An x-year grant can be based on results from the last x years. This eliminates the issue of unpredictable research from a research foundation perspective. The researcher can at his own risk follow the craziest inspiration, but he or she ha...
Article
We show that linear probing requires 5-independent hash functions for expected constant-time performance, matching an upper bound of [Pagh et al. STOC'07]. More precisely, we construct a 4-independent hash functions yielding expected logarithmic search time. For (1+{\epsilon})-approximate minwise independence, we show that \Omega(log 1/{\epsilon})-...
Article
We introduce a new tabulation-based hashing scheme called "twisted tabulation". It is essentially as simple and fast as simple tabulation, but has some powerful distributional properties illustrating its promise: (1) If we sample keys with arbitrary probabilities, then with high probability, the number of samples inside any subset is concentrated e...
Article
Distance oracles are data structures that provide fast (possibly approximate) answers to shortest-path and distance queries in graphs. The tradeoff between the space requirements and the query time of distance oracles is of particular interest and the main focus of this paper. Unless stated otherwise, we assume all graphs to be planar and undirecte...
Conference Paper
Given a weighted undirected graph, our basic goal is to represent all pairwise distances using much less than quadratic space, such that we can estimate the distance between query vertices in constant time. We will study the inherent trade-off between space of the representation and the stretch (multiplicative approximation disallowing underestimat...
Article
Randomized algorithms are often enjoyed for their simplicity, but the hash functions used to yield the desired theoretical guarantees are often neither simple nor practical. Here we show that the simplest possible tabulation hashing provides unexpectedly strong guarantees. The scheme itself dates back to Zobrist in 1970 who used it for game playing...
Article
We consider the problem of coloring a 3-colorable graph in polynomial time using as few colors as possible. We present a combinatorial algorithm getting down to (O) over tilde (n(4/11)) colors. This is the first combinatorial improvement of Blum's (O) over tilde (n(3/8)) bound from FOCS'90. Like Blum's algorithm, our new algorithm composes nicely w...
Article
In the framework of Wegman and Carter, a $k$-independent hash function maps any $k$ keys independently. It is known that 5-independent hashing provides good expected performance in applications such as linear probing and second moment estimation for data streams. The classic $5$-independent hash function evaluates a degree 4 polynomial over a prime...
Conference Paper
We consider portable software implementations of hash tables with timeouts. The context is a high volume stream of keyed items. When a new item arrives, we want to know if has been seen recently in terms of a fixed lifespan. This problem has numerous applications as a front-end for Internet traffic processing where the key could be a selection of f...
Article
We present a new threshold phenomenon in data structure lower bounds where slightly reduced update times lead to exploding query times. Consider incremental connectivity, letting t_u be the time to insert an edge and t_q be the query time. For t_u = Omega(t_q), the problem is equivalent to the well-understood union-find problem: InsertEdge(s,t) can...
Article
Full-text available
We consider a the minimum k-way cut problem for unweighted graphs with a size bound s on the number of cut edges allowed. Thus we seek to remove as few edges as possible so as to split a graph into k components, or report that this requires cutting more than s edges. We show that this problem is fixed-parameter tractable (FPT) in s. More precisely,...
Article
Full-text available
From a high volume stream of weighted items, we want to maintain a generic sample of a certain limited size $k$ that we can later use to estimate the total weight of arbitrary subsets. This is the classic context of on-line reservoir sampling, thinking of the generic sample as a reservoir. We present an efficient reservoir sampling scheme, $\textno...
Article
Randomized algorithms are often enjoyed for their simplicity, but the hash functions used to yield the desired theoretical guarantees are often neither simple nor practical. Here we show that the simplest possible tabulation hashing provides unexpectedly strong guarantees. The scheme itself dates back to Zobrist in 1970 who used it for game playing...
Conference Paper
We show that linear probing requires 5-independent hash functions for expected constant-time performance, matching an upper bound of [A. Pagh et al., SIAM J. Comput. 39, No. 3, 1107–1120 (2009; Zbl 1192.68204)]. For (1+ϵ)-approximate minwise independence, we show that Ω(lg1 ε)-independent hash functions are required, matching an upper bound of [P....
Conference Paper
Full-text available
We describe a simple, but powerful local encoding technique, implying two surprising results: 1. We show how to represent a vector of n values from some alphabet S using ceiling(n * log2 |S|) bits, such that reading or writing any entry takes O(1) time. This demonstrates, for instance, an "equivalence" between decimal and binary computers, and has...
Conference Paper
Regular expression matching is a key task (and of- ten computational bottleneck) in a variety of software tools and applications. For instance, the standard grep and sed utilities, scripting languages such as perl, internet trac analysis, XML querying, and protein searching. The basic denition of a regu- lar expression is that we combine characters...
Conference Paper
Full-text available
Previously [SODA’04] we devised the fastest known algorithm for 4-universal hashing. The hashing was based on small pre-computed4-universal tables. This led to a five-fold improvement in speed over direct methods based on degree 3 polynomials. In this paper, we show that if the pre-computed tables are made 5-universal, then the hash value becomes 5...
Article
Full-text available
We present two new algorithms for finding optimal strategies for discounted, infinite-horizon, Determinis-tic Markov Decision Processes (DMDP). The first one is an adaptation of an algorithm of Young, Tarjan and Orlin for finding minimum mean weight cycles. It runs in O(mn + n 2 log n) time, where n is the number of vertices (or states) and m is th...
Article
Full-text available
Many data sets occur as unaggregated data sets, where multiple data points are associated with each key. In the aggregate view of the data, the weight of a key is the sum of the weights of data points associated with the key. Examples are measurements of IP packet header streams, distributed data streams produced by events reg- istered by sensor ne...
Conference Paper
Full-text available
Regular expression matching is a key task (and often the computational bottleneck) in a variety of widely used software tools and applications, for instance, the unix grep and sed commands, scripting languages such as awk and perl, programs for analyzing massive data streams, etc. We show how to solve this ubiquitous task in linear space and O(nm(l...
Conference Paper
Linear probing is one of the most popular implementations of dynamic hash tables storing all keys in a single array. When we get a key, we first hash it to a location. Next we probe consecutive locations until the key or an empty location is found. At STOC'07, Pagh et al. presented data sets where the standard implementation of 2-universal hashing...
Conference Paper
From a high volume stream of weighted items, we want to maintain a generic sample of a certain limited size $k$ that we can later use to estimate the total weight of arbitrary subsets. This is the classic context of on-line reservoir sampling, thinking of the generic sample as a reservoir. We present an efficient reservoir sampling scheme, $\varopt...
Conference Paper
Full-text available
We present two new algorithms for finding optimal strategies for discounted, infinite-horizon, Determinis- tic Markov Decision Processes (DMDP). The first one is an adaptation of an algorithm of Young, Tarjan and Orlin for finding minimum mean weight cycles. It runs in O(mn + n2 log n) time, where n is the number of vertices (or states) and m is th...
Conference Paper
Full-text available
Measurement, collection, and interpretation of network usage data commonly involves multiple stage of sampling and aggregation. Examples include sampling packets, aggregating them into flow statistics at a router, sampling and aggregation of usage records in a network data repository for reporting, query and archiving. Although unbiased estimates o...
Conference Paper
We present a simple and fast deterministic algorithm for the minimum k-way cut problem in a capacitated graph, that is, finding a set of edges with minimum total capacity whose removal splits the graph into at least k components. The algorithm packs O(mk3 log n) trees. Each new tree is a minimal spanning tree with respect to the edge utilizations,...
Article
Full-text available
Dynamic shortest path algorithms update the shortest paths to take into ac-count a change in an edge weight. This paper describes a new technique that allows the reduction of heap sizes used by several dynamic shortest path algorithms. For unit weight change, the updates can be done without heaps. These reductions almost always reduce the computati...
Article
Full-text available
From a high volume stream of weighted items, we want to maintain a generic sample of a certain limited size $k$ that we can later use to estimate the total weight of arbitrary subsets. This is the classic context of on-line reservoir sampling, thinking of the generic sample as a reservoir. We present a reservoir sampling scheme providing variance o...
Article
Full-text available
We consider the problem of preprocessing an edge-weighted directed graph G to answer queries that ask for the shortest distance from any given node x to any other node y avoiding an arbitrary failed node or link. We describe an oracle (i.e, a simple data structure) for such queries that can be stored in O(n2 log n) space, and which allows queries t...

Network

Cited By

Projects

Project (1)
Archived project
Two streams of research were developed in parallel. (1) A theory of denotational models for Pascal-like programming languages based on set-theory, many-sorted algebras and a three-valued predicate calculus. That approach was an alternative to a model based on reflexive domains (by Dana Scott) and continuations. As a tool for defining denotations, syntax and semantics of concrete programming-languages, a metalanguage MetaSoft was proposed. (2) Given a denotational model (in our sense) of a programming language, one can define sound program-constructors, i.e. constructors which given correct components build correct resulting programs. That approach was based on a Hoare-like logic of total correctness with clean termination. A follow-up of that project started in 2018 under the name of Denotational Engineering.