Neal E. Young

Neal E. Young
University of California, Riverside | UCR · Department of Computer Science and Engineering

PhD

About

135
Publications
12,629
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,150
Citations
Citations since 2017
20 Research Items
1342 Citations
2017201820192020202120222023050100150200
2017201820192020202120222023050100150200
2017201820192020202120222023050100150200
2017201820192020202120222023050100150200
Additional affiliations
January 2004 - January 2020
University of California, Riverside
Position
  • Professor
September 1995 - September 1999
Dartmouth College
Position
  • Professor (Assistant)
September 1993 - February 1994
Princeton University
Position
  • Instructor

Publications

Publications (135)
Preprint
It is natural to generalize the $k$-Server problem by allowing each request to specify not only a point $p$, but also a subset $S$ of servers that may serve it. To attack this generalization, we focus on uniform and star metrics. For uniform metrics, the problem is equivalent to a generalization of Paging in which each request specifies not only a...
Article
Full-text available
Huang and Wong (Acta Inform 21(1):113–123, 1984) proposed a polynomial-time dynamic-programming algorithm for computing optimal generalized binary split trees. We show that their algorithm is incorrect. Thus, it remains open whether such trees can be computed in polynomial time. Spuler (Optimal search trees using two-way key comparisons, PhD thesis...
Article
We present a simple O(n ⁴ ) -time algorithm for computing optimal search trees with two-way comparisons. The only previous solution to this problem, by Anderson et al., has the same running time but is significantly more complicated and is restricted to the variant where only successful queries are allowed. Our algorithm extends directly to solve t...
Article
Full-text available
Modern NoSQL database systems use log-structured merge (LSM) storage architectures to support high write throughput. LSM architectures aggregate writes in a mutable MemTable (stored in memory), which is regularly flushed to disk, creating a new immutable file called an SSTable. Some of the SSTables are chosen to be periodically merged—replaced with...
Preprint
Full-text available
Search trees are commonly used to implement access operations to a set of stored keys. If this set is static and the probabilities of membership queries are known in advance, then one can precompute an optimal search tree, namely one that minimizes the expected access cost. For a non-key query, a search tree can determine its approximate location b...
Preprint
Full-text available
We present a simple $O(n^4)$-time algorithm for computing optimal search trees with two-way comparisons. The only previous solution to this problem, by Anderson et al., has the same running time, but is significantly more complicated and is restricted to the variant where only successful queries are allowed. Our algorithm extends directly to solve...
Article
Search trees are commonly used to implement access operations to a set of stored keys. If this set is static and the probabilities of membership queries are known in advance, then one can precompute an optimal search tree, namely one that minimizes the expected access cost. For a non-key query, a search tree can determine its approximate location b...
Preprint
Data-structure dynamization is a general approach for making static data structures dynamic. It is used extensively in geometric settings and in the guise of so-called merge (or compaction) policies in big-data databases such as Google Bigtable and LevelDB (our focus). Previous theoretical work is based on worst-case analyses for uniform inputs --...
Preprint
We study the problem of selecting control clones in DNA array hybridization experiments. The problem arises in the OFRG method for analyzing microbial communities. The OFRG method performs classification of rRNA gene clones using binary fingerprints created from a series of hybridization experiments, where each experiment consists of hybridizing a...
Preprint
This paper gives poly-logarithmic-round, distributed D-approximation algorithms for covering problems with submodular cost and monotone covering constraints (Submodular-cost Covering). The approximation ratio D is the maximum number of variables in any constraint. Special cases include Covering Mixed Integer Linear Programs (CMIP), and Weighted Ver...
Chapter
In this Web 2.0 era, there is an ever increasing number of customer reviews, which must be summarized to help consumers effortlessly make informed decisions. Previous work on reviews summarization has simplified the problem by assuming that aspects (e.g., “display”) are independent of each other and that the opinion for each aspect in a review is B...
Preprint
Huang and Wong [5] proposed a polynomial-time dynamic-programming algorithm for computing optimal generalized binary split trees. We show that their algorithm is incorrect. Thus, it remains open whether such trees can be computed in polynomial time. Spuler [11, 12] proposed modifying Huang and Wong's algorithm to obtain an algorithm for a different...
Conference Paper
We consider the problem of political redistricting: given the locations of people in a geographical area (e.g. a US state), the goal is to decompose the area into subareas, called districts, so that the populations of the districts are as close as possible and the districts are "compact" and "contiguous," to use the terms referred to in most US sta...
Article
We propose a method for redistricting, decomposing a geographical area into subareas, called districts, so that the populations of the districts are as close as possible and the districts are compact and contiguous. Each district is the intersection of a polygon with the geographical area. The polygons are convex and the average number of sides per...
Conference Paper
In 1971, Knuth gave an \(O(n^2)\)-time algorithm for the classic problem of finding an optimal binary search tree. Knuth’s algorithm works only for search trees based on 3-way comparisons, but most modern computers support only 2-way comparisons (\(<\), \(\le \), \(=\), \(\ge \), and \(>\)). Until this paper, the problem of finding an optimal searc...
Article
Full-text available
The Joint Replenishment Problem ($${\hbox {JRP}}$$JRP) is a fundamental optimization problem in supply-chain management, concerned with optimizing the flow of goods from a supplier to retailers. Over time, in response to demands at the retailers, the supplier ships orders, via a warehouse, to the retailers. The objective is to schedule these orders...
Article
Over the past decade, time series clustering has become an increasingly important research topic in data mining community. Most existing methods for time series clustering rely on distances calculated from the entire raw data using the Euclidean distance or Dynamic Time Warping distance as the distance measure. However, the presence of significant...
Article
Full-text available
We study the following problem: given a set of keys and access probabilities, find a minimum-cost binary search tree that uses only 2-way comparisons ($=, <, \le$) at each node. We give the first polynomial-time algorithm when both successful and unsuccessful queries are allowed, settling a long-standing open question. Our algorithm relies on a new...
Article
Full-text available
We describe nearly linear-time approximation algorithms for explicitly given mixed packing/covering and facility-location linear programs. The algorithms compute $(1+\epsilon)$-approximate solutions in time $O(N \log(N)/\epsilon^2)$, where $N$ is the number of non-zeros in the constraint matrix. We also describe parallel variants taking time $O(\te...
Article
Full-text available
We initiate the formal study of the online stack-compaction policies used by big-data NoSQL databases such as Google Bigtable, Hadoop HBase, and Apache Cassandra. We propose a deterministic policy, show that it is optimally competitive, benchmark it against Bigtable's default policy, and suggest five interesting open problems.
Article
Full-text available
Can one choose a good Huffman code on the fly, without knowing the underlying distribution? Online Slot Allocation (OSA) models this and similar problems: There are n slots, each with a known cost. There are n items. Requests for items are drawn i.i.d. from a fixed but hidden probability distribution p. After each request, if the item, i, was not p...
Article
Full-text available
We give a short proof that any comparison-based n^(1-epsilon)-approximation algorithm for the 1-dimensional Traveling Salesman Problem (TSP) requires Omega(n log n) comparisons.
Conference Paper
Full-text available
The Joint Replenishment Problem (JRP) is a fundamental optimization problem in supply-chain management, concerned with optimizing the flow of goods over time from a supplier to retailers. Over time, in response to demands at the retailers, the supplier sends shipments, via a warehouse, to the retailers. The objective is to schedule shipments to min...
Article
Full-text available
The \emph{file caching} problem is defined as follows. Given a cache of size $k$ (a positive integer), the goal is to minimize the total retrieval cost for the given sequence of requests to files. A file $f$ has size $size(f)$ (a positive integer) and retrieval cost $cost(f)$ (a non-negative number) for bringing the file into the cache. A \emph{mis...
Article
Full-text available
Given a satisfiable 3-SAT formula, how hard is it to find an assignment to the variables that has Hamming distance at most n/2 to a satisfying assignment? More generally, consider any polynomial-time verifier for any NP-complete language. A d(n)-Hamming-approximation algorithm for the verifier is one that, given any member x of the language, output...
Article
Full-text available
Minimum-weight triangulation (MWT) is NP-hard. It has a polynomial-time constant-factor approximation algorithm, and a variety of effective polynomial- time heuristics that, for many instances, can find the exact MWT. Linear programs (LPs) for MWT are well-studied, but previously no connection was known between any LP and any approximation algorith...
Article
Full-text available
This paper gives poly-logarithmic-round, distributed δ-approximation algorithms for covering problems with submodular cost and monotone covering constraints (Submodular-cost Covering). The approximation ratio δ is the maximum number of variables in any constraint. Special cases include Covering Mixed Integer Linear Programs (CMIP), and Weighted Ver...
Conference Paper
Full-text available
Time series shapelets are small, local patterns in a time series that are highly predictive of a class and are thus very useful features for building classifiers and for certain visualization and summarization tasks. While shapelets were introduced only recently, they have already seen significant adoption and extension in the community. Despite th...
Article
Full-text available
We consider the problem of choosing Euclidean points to maximize the sum of their weighted pairwise distances, when each point is constrained to a ball centered at the origin. We derive a dual minimization problem and show strong duality holds (i.e., the resulting upper bound is tight) when some locally optimal configuration of points is affinely i...
Conference Paper
Full-text available
We present efficient distributed δ-approximation algorithms for fractional packing and maximum weighted b-matching in hypergraphs, where δ is the maximum number of packing constraints in which a variable appears (for maximum weighted b-matching δ is the maximum edge degree — for graphs δ= 2). (a) For δ= 2 the algorithm runs in O(logm) rounds in exp...
Chapter
Full-text available
This paper describes a greedy D{\ensuremath{\Delta}}-approximation algorithm for monotone covering, a generalization of many fundamental NP-hard covering problems. The approximation ratio D{\ensuremath{\Delta}} is the maximum number of variables on which any constraint depends. (For example, for vertex cover, D{\ensuremath{\Delta}} is 2.) The algor...
Article
Full-text available
With fully directional communications, nodes must track the positions of their neighbors so that communication with these neighbors is feasible when needed. Tracking process introduces an overhead, which increases with the number of discovered neighbors. The overhead can be reduced if nodes maintain only a subset of their neighbors; however, this m...
Conference Paper
Full-text available
The paper presents distributed and parallel -approximation algorithms for covering problems, where is the maximum number of variables on which any constraint depends (for example, = 2 for vertex cover). Specic results include the following. For weighted vertex cover, the rst distributed 2-ap- proximation algorithm taking O(logn) rounds and the rst...
Conference Paper
Full-text available
This paper describes a simple greedy D-approximation algorithm for any covering problem whose objective function is submodular and non-decreasing, and whose feasible region can be expressed as the intersection of arbitrary (closed upwards) covering constraints, each of which constrains at most D variables of the problem. (A simple example is Vertex...
Article
Full-text available
In the k-median problem we are given sets of facilities and customers, and distances between them. For a given set F of facilities, the cost of serving a customer u is the minimum distance between u and a facility in F. The goal is to find a set F of k facilities that minimizes the sum, over all customers, of their service costs. Following the wor...
Article
Full-text available
We give an approximation algorithm for packing and covering linear programs (linear programs with non-negative coefficients). Given a constraint matrix with n non-zeros, r rows, and c columns, the algorithm computes feasible primal and dual solutions whose costs are within a factor of 1+eps of the optimal cost in time O((r+c)log(n)/eps^2 + n).
Conference Paper
Full-text available
We give an approximation algorithm for packing and covering linear programs (linear programs with non-negative coefficients). Given a constraint matrix with n non-zeros, r rows, and c columns, the algorithm (with high probability) computes feasible primal and dual solutions whose costs are within a factor of I +epsiv of OPT l+ epsiv of OPT (the opt...
Article
Full-text available
We study the problem of selecting control clones in DNA array hybridization experiments. The problem arises in the OFRG method for analyzing microbial communities. The OFRG method performs classification of rRNA gene clones using binary fingerprints created from a series of hybridization experiments, where each experiment consists of hybridizing a...
Conference Paper
Full-text available
Dimension attributes in data warehouses are typically hierarchical, and a variety of OLAP applications (such as point-of-sales analysis and decision support) call for summarizing the measure attributes in fact tables along the hierarchies of these attributes. For example, the total sales at different stores can be summarized hierarchically by geogr...
Conference Paper
Full-text available
We start with definitions given by Plotkin, Shmoys, and Tardos [16]. Given A∈ℝm×n, b∈ℝm and a polytope P \( \subseteq\) ℝn , the fractional packing problem is to find an x ∈ P such that Ax ≤ b if such an x exists. An ∈-approximate solution to this problem is an x ∈ P such that Ax ≤ (1+∈)b. An ∈-relaxed decision procedure always finds an ∈-approxima...
Conference Paper
Full-text available
Dimension attributes in data warehouses are typically hierarchical (e.g., geographic locations in sales data, URLs in Web traffic logs). OLAP tools are used to summarize the measure attributes (e.g., total sales) along a dimension hierarchy, and to characterize changes (e.g., trends and anomalies) in a hierarchical summary over time. When thenumber...
Conference Paper
Full-text available
With directional antennas, it is extremely important that a node maintains information with regards to the positions of its neighbors. This would allow the node to "track" the neighbors as they move; otherwise, a node will have to resort to either omnidirectional or circular directional transmissions (or receptions) fairly often. This can be overhe...
Conference Paper
We consider the following variant of Huffman coding in which the costs of the letters, rather than the probabilities of the words, are non-uniform: Given an alphabet of unequal-length letters, find a minimum-average-length prefix-free set of n codewords over the alphabet. We show new structural properties of such codes, leading to an O(n log2 r) ti...
Conference Paper
Full-text available
Our objective in this paper is to design topology control algorithms such that (i) nodes have low degree and (ii) paths in the network have few hops. Low node degree is desirable in networks equipped with smart antennas and to reduce access contention. Short paths are desirable for minimizing communication delays and for better robustness to channe...
Conference Paper
Full-text available
The Reverse Greedy algorithm (RGreedy) for the k-median problem works as follows. It starts by placing facilities on all nodes. At each step, it removes a facility to minimize the resulting total distance from the customers to the remaining facilities. It stops when k facilities remain. We prove that, if the distance function is metric, then the ap...
Article
The Reverse Greedy algorithm (RGreedy) for the k-median problem works as follows. It starts by placing facilities on all nodes. At each step, it removes a facility to minimize the resulting total distance from the customers to the remaining facilities. It stops when k facilities remain. We prove that, if the distance function is metric, then the ap...
Conference Paper
Full-text available
Following Mettu and Plaxton [22, 21], we study oblivious algorithms for the k-medians problem. Such an algorithm produces an incremental sequence of facility sets. We give improved algorithms, including a (24+ε)-competitive deterministic polynomial algorithm and a 2e ≈ 5.44-competitive randomized non-polynomial algorithm. Our approach is similar to...
Article
Full-text available
The multiway-cut problem is, given a weighted graph and k >= 2 terminal nodes, to find a minimum-weight set of edges whose removal separates all the terminals. The problem is NP-hard, and even NP-hard to approximate within 1+delta for some small delta > 0. Calinescu, Karloff, and Rabani (1998) gave an algorithm with performance guarantee 3/2-1/k, b...
Article
Full-text available
Large surveys using multiobject spectrographs require automated methods for deciding how to efficiently point observations and how to assign targets to each pointing. The Sloan Digital Sky Survey (SDSS) will observe around 106 spectra from targets distributed over an area of about 10,000 deg2, using a multiobject fiber spectrograph that can simulta...
Article
Full-text available
Consider the following file caching problem: in response to a sequence of requests for files, where each file has a specified size and retrieval cost , maintain a cache of files of total size at most some specified k so as to minimize the total retrieval cost. Specifically, when a requested file is not in the cache, bring it into the cache and pay...
Article
Full-text available
A generalization of the Seidel-Entringer-Arnold method for calculating the alternating permutation numbers (or secant-tangent numbers) leads to a new operation on integer sequences, the Boustrophedon transform.
Article
The problem considered is the following. Given a graph with edge weights satisfying the triangle inequality, and a degree bound for each vertex, compute a low-weight spanning tree such that the degree of each vertex is at most its specified bound. The problem is NP-hard (it generalizes Traveling Salesman (TSP)). This paper describes a network-flow...
Article
Full-text available
This report presents notes from the first eight lectures of the class Many Models of Complexity taught by Laszlo Lovasz at Princeton University in the fall of 1990. The topic is evasiveness of graph properties: given a graph property, how many edges of the graph an algorithm must check in the worst case before it knows whether the property holds.
Article
Full-text available
Congestion control in the current Internet is accomplished mainly by TCP/IP. To understand the macroscopic network behavior that results from TCP/IP and similar end-to-end protocols, one main analytic technique is to show that the the protocol maximizes some global objective function of the network traffic. Here we analyze a particular end-to-end,...
Article
Full-text available
The goal of the Sloan Digital Sky Survey is ``to map in detail one-quarter of the entire sky, determining the positions and absolute brightnesses of more than 100 million celestial objects''. The survey will be performed by taking ``snapshots'' through a large telescope. Each snapshot can capture up to 600 objects from a small circle of the sky. Th...
Article
Full-text available
Two common objectives for evaluating a schedule are the makespan, or schedule length, and the average completion time. This short note gives improved bounds on the existence of schedules that simultaneously optimize both criteria. In particular, for any rho> 0, there exists a schedule of makespan at most 1+rho times the minimum, with average comple...
Article
Full-text available
In this paper we introduce the notion of approximate da2a siruclures, in which a small amount of error is tolerated in the output. Approximate data structures trade error of approximation for faster operation, leading to theoretical and practical speedups for a wide variety of algorithms. We give approximate variants of the van Emde Boas data struc...
Article
Full-text available
Von Neumann's Min-Max Theorem guarantees that each player of a zero-sum matrix game has an optimal mixed strategy. This paper gives an elementary proof that each player has a near-optimal mixed strategy that chooses uniformly at random from a multiset of pure strategies of size logarithmic in the number of pure strategies available to the opponent....
Article
Full-text available
The parametric shortest path problem is to find the shortest paths in graph where the edge costs are of the form w_ij+lambda where each w_ij is constant and lambda is a parameter that varies. The problem is to find shortest path trees for every possible value of lambda. The minimum-balance problem is to find a ``weighting'' of the vertices so that...
Article
Full-text available
Pattern-matching-based document-compression systems (e.g. for faxing) rely on finding a small set of patterns that can be used to represent all of the ink in the document. Finding an optimal set of patterns is NP-hard; previous compression schemes have resorted to heuristics. This paper describes an extension of the cross-entropy approach, used pre...
Article
Given matrices A and B and vectors a, b, c and d, all with non-negative entries, we consider the problem of computing . We give a bicriteria-approximation algorithm that, given ε∈(0,1], finds a solution of cost O(ln(m)/ε2) times optimal, meeting the covering constraints (Ax⩾a) and multiplicity constraints (x⩽d), and satisfying Bx⩽(1+ε)b+β, where β...
Article
Full-text available
We give a polynomial-time approximation scheme for the generalization of Huffman Coding in which codeword letters have non-uniform costs (as in Morse code, where the dash is twice as long as the dot). The algorithm computes a (1+epsilon)-approximate solution in time O(n + f(epsilon) log^3 n), where n is the input size.
Conference Paper
Full-text available
Congestion control in the current Internet is accomplished mainly by TCP/IP. To understand the macroscopic network behavior that results from TCP/IP and similar end-to-end protocols, one main analytic technique is to show that the the protocol maximizes some global objective function of the network traffic. We analyze a particular end-to-end MIMD (...
Article
Full-text available
This paper give a simple linear-time algorithm that, given a weighted digraph, finds a spanning tree that simultaneously approximates a shortest-path tree and a minimum spanning tree. The algorithm provides a continuous trade-off: given the two trees and epsilon > 0, the algorithm returns a spanning tree in which the distance between any vertex and...
Article
Full-text available
this paper we give a natural probability distribution of fractional packing instances such that, for an instance chosen at random, with probability 1 o(1) any Dantzig-Wolfe-type -relaxed procedure must make at
Article
Full-text available
(MATH) In the standard Huffman coding problem, one is given a set of words and for each word a positive frequency. The goal is to encode each word w as a codeword c(w) over a given alphabet. The encoding must be prefix free (no codeword is a prefix of any other) and should minimize the weighted average codeword size &Sgr;w freq w, &124;c(w)&124;. T...
Conference Paper
Full-text available
We describe sequential and parallel algorithms that approximately solve linear programs with no negative coefficients (aka mixed packing and covering problems). For explicitly given problems, our fastest sequential algorithm returns a solution satisfying all constraints within a 1±&epsi; factor in O(mdlog(m)/&epsi;<sup>2</sup>) time, where m is the...
Article
Full-text available
It is well known that every 2-edge-connected graph can be oriented so that the resulting
Article
Full-text available
We report on implementation and a modest experimental evaluation of a recently introduced
Article
Full-text available
We study the general (non-metric) facility-location and weighted k-medians problems, as well as the fractional facility-location and unweighted k-medians problems. We describe a natural randomized rounding scheme and use it to derive approximation algorithms for all of these problems. For facility location and weighted k-medians, the respective alg...
Article
Full-text available
Randomized rounding is a standard method, based on the probabilistic method, for designing combinatorial approximation algorithms. In Raghavan's seminal paper introducing the method (1988), he writes: "The time taken to solve the linear program relaxations of the integer programs dominates the net running time theoretically (and, most likely, in pr...
Article
Full-text available
In this problem, the input is a sequence of requests for files, given on-line (one at a time). Each file has a non-negative size and a non-negative retrieval cost. The problem is to decide which files to keep in a fixed-size cache so as to minimize the sum of the retrieval costs for files that are not in the cache when requested. The problem arises...
Article
Full-text available
Weighted caching is a generalization of paging in which the cost to
Article
Full-text available
This paper give a simple linear-time algorithm that, given a weighted digraph, finds a spanning tree that simultaneously approximates a shortest-path tree and a minimum spanning tree. The algorithm provides a continuous trade-off: given the two trees and epsilon > 0, the algorithm returns a spanning tree in which the distance between any vertex and...
Article
Full-text available
We give an efficient deterministic parallel approximation algorithm for the minimumweight
Article
Full-text available
The Sloan Digital Sky Survey (SDSS) will observe around 10^6 spectra from targets distributed over an area of about 10,000 square degrees, using a multi-object fiber spectrograph which can simultaneously observe 640 objects in a circular field-of-view (referred to as a ``tile'') 1.49 degrees in radius. No two fibers can be placed closer than 55'' d...
Article
Full-text available
The MEG (minimum equivalent graph) problem is, given a directed graph, to find a small subset of the edges that maintains all reachability relations between nodes. The problem is NP-hard. This paper gives a proof that, for graphs where each directed cycle has at most three edges, the MEG problem is equivalent to maximum bipartite matching, and ther...