Josh Alman

Josh Alman
Harvard University | Harvard · Area of Computer Science

About

44
Publications
1,694
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
694
Citations

Publications

Publications (44)
Preprint
Full-text available
Generalizing work of K\"unnemann, Paturi, and Schneider [ICALP 2017], we study a wide class of high-dimensional dynamic programming (DP) problems in which one must find the shortest path between two points in a high-dimensional grid given a tensor of transition costs between nodes in the grid. This captures many classical problems which are solved...
Preprint
Full-text available
In modern machine learning, inner product attention computation is a fundamental task for training large language models such as Transformer, GPT-1, BERT, GPT-2, GPT-3 and ChatGPT. Formally, in this problem, one is given as input three matrices $Q, K, V \in [-B,B]^{n \times d}$, and the goal is to construct the matrix $\mathrm{Att}(Q,K,V) := \mathr...
Preprint
Full-text available
Over the last decade, deep neural networks have transformed our society, and they are already widely applied in various machine learning applications. State-of-art deep neural networks are becoming larger in size every year to deliver increasing model accuracy, and as a result, model training consumes substantial computing resources and will only c...
Preprint
We give algorithms with lower arithmetic operation counts for both the Walsh-Hadamard Transform (WHT) and the Discrete Fourier Transform (DFT) on inputs of power-of-2 size $N$. For the WHT, our new algorithm has an operation count of $\frac{23}{24}N \log N + O(N)$. To our knowledge, this gives the first improvement on the $N \log N$ operation count...
Preprint
We give new, smaller constructions of constant-depth linear circuits for computing any matrix which is the Kronecker power of a fixed matrix. A standard argument (e.g., the mixed product property of Kronecker products, or a generalization of the Fast Walsh-Hadamard transform) shows that any such $N \times N$ matrix has a depth-2 circuit of size $O(...
Preprint
We use lookup tables to design faster algorithms for important algebraic problems over finite fields. These faster algorithms, which only use arithmetic operations and lookup table operations, may help to explain the difficulty of determining the complexities of these important problems. Our results over a constant-sized finite field are as follows...
Preprint
For any real numbers $B \ge 1$ and $\delta \in (0, 1)$ and function $f: [0, B] \rightarrow \mathbb{R}$, let $d_{B; \delta} (f) \in \mathbb{Z}_{> 0}$ denote the minimum degree of a polynomial $p(x)$ satisfying $\sup_{x \in [0, B]} \big| p(x) - f(x) \big| < \delta$. In this paper, we provide precise asymptotics for $d_{B; \delta} (e^{-x})$ and $d_{B;...
Preprint
We design the first efficient sensitivity oracles and dynamic algorithms for a variety of parameterized problems. Our main approach is to modify the algebraic coding technique from static parameterized algorithm design, which had not previously been used in a dynamic context. We particularly build off of the `extensor coding' method of Brand, Dell...
Preprint
For a matrix $M$ and a positive integer $r$, the rank $r$ rigidity of $M$ is the smallest number of entries of $M$ which one must change to make its rank at most $r$. There are many known applications of rigidity lower bounds to a variety of areas in complexity theory, but fewer known applications of rigidity upper bounds. In this paper, we use rig...
Preprint
Full-text available
For a function $\mathsf{K} : \mathbb{R}^{d} \times \mathbb{R}^{d} \to \mathbb{R}_{\geq 0}$, and a set $P = \{ x_1, \ldots, x_n\} \subset \mathbb{R}^d$ of $n$ points, the $\mathsf{K}$ graph $G_P$ of $P$ is the complete graph on $n$ nodes where the weight between nodes $i$ and $j$ is given by $\mathsf{K}(x_i, x_j)$. In this paper, we initiate the stu...
Preprint
The complexity of matrix multiplication is measured in terms of $\omega$, the smallest real number such that two $n\times n$ matrices can be multiplied using $O(n^{\omega+\epsilon})$ field operations for all $\epsilon>0$; the best bound until now is $\omega<2.37287$ [Le Gall'14]. All bounds on $\omega$ since 1986 have been obtained using the so-cal...
Article
Fixed-parameter algorithms and kernelization are two powerful methods to solve NP-hard problems. Yet so far those algorithms have been largely restricted to static inputs. In this article, we provide fixed-parameter algorithms and kernelizations for fundamental NP-hard problems with dynamic inputs. We consider a variety of parameterized graph and h...
Article
Josh Alman and Virginia Vassilevska Williams. A graph G on n nodes is an Orthogonal Vectors (OV) graph of dimension d if there are vectors v1, . . ., vn ∈ {0, 1}d such that nodes i and j are adjacent in G if and only if hvi, vji = 0 over Z. In this paper, we study a number of basic graph algorithm problems, except where one is given as input the ve...
Chapter
In predicate encryption for a function f, an authority can create ciphertexts and secret keys which are associated with ‘attributes’. A user with decryption key \(K_y\) corresponding to attribute y can decrypt a ciphertext \(CT_x\) corresponding to a message m and attribute x if and only if \(f(x,y)=0\). Furthermore, the attribute x remains hidden...
Preprint
In this paper, we present a new algorithm for maintaining linear sketches in turnstile streams with faster update time. As an application, we show that $\log n$ \texttt{Count} sketches or \texttt{CountMin} sketches with a constant number of columns (i.e., buckets) can be implicitly maintained in \emph{worst-case} $O(\log^{0.582} n)$ update time usi...
Preprint
In a recent work, Alman and Vassilevska Williams [FOCS 2018, arXiv:1810.08671 [cs.CC]] proved limitations on designing matrix multiplication algorithms using the Galactic method applied to many tensors of interest, including the family of Coppersmith-Winograd tensors. In this note, we extend all their lower bounds to the more powerful Universal met...
Preprint
We study the known techniques for designing Matrix Multiplication algorithms. The two main approaches are the Laser method of Strassen, and the Group theoretic approach of Cohn and Umans. We define a generalization based on zeroing outs which subsumes these two approaches, which we call the Solar method, and an even more general method based on mon...
Preprint
The Light Bulb Problem is one of the most basic problems in data analysis. One is given as input $n$ vectors in $\{-1,1\}^d$, which are all independently and uniformly random, except for a planted pair of vectors with inner product at least $\rho \cdot d$ for some constant $\rho > 0$. The task is to find the planted pair. The most straightforward a...
Article
2018 IEEE. We study the known techniques for designing Matrix Multiplication algorithms. The two main approaches are the Laser method of Strassen, and the Group theoretic approach of Cohn and Umans. We define a generalization based on zeroing outs which subsumes these two approaches, which we call the Solar method, and an even more general method b...
Conference Paper
In this work, we introduce an online model for communication complexity. Analogous to how online algorithms receive their input piece-by-piece, our model presents one of the players, Bob, his input piece-by-piece, and has the players Alice and Bob cooperate to compute a result each time before the next piece is revealed to Bob. This model has a clo...
Article
We consider the techniques behind the current best algorithms for matrix multiplication. Our results are threefold. (1) We provide a unifying framework, showing that all known matrix multiplication running times since 1986 can be achieved from a single very natural tensor - the structural tensor $T_q$ of addition modulo an integer $q$. (2) We show...
Conference Paper
Fixed-parameter algorithms and kernelization are two powerful methods to solve $\mathsf{NP}$-hard problems. Yet, so far those algorithms have been largely restricted to static inputs. In this paper we provide fixed-parameter algorithms and kernelizations for fundamental $\mathsf{NP}$-hard problems with dynamic inputs. We consider a variety of param...
Conference Paper
We consider a notion of probabilistic rank and probabilistic sign-rank of a matrix, which measure the extent to which a matrix can be probabilistically represented by low-rank matrices. We demonstrate several connections with matrix rigidity, communication complexity, and circuit lower bounds. The most interesting outcomes are: The Walsh-Hadamard T...
Article
In this work, we introduce an online model for communication complexity. Analogous to how online algorithms receive their input piece-by-piece, our model presents one of the players Bob his input piece-by-piece, and has the players Alice and Bob cooperate to compute a result it presents Bob with the next piece. This model has a closer and more natu...
Article
We consider a notion of probabilistic rank and probabilistic sign-rank of a matrix, which measures the extent to which a matrix can be probabilistically represented by low-rank matrices. We demonstrate several connections with matrix rigidity, communication complexity, and circuit lower bounds, including: The Walsh-Hadamard Transform is Not Very Ri...
Article
We design new polynomials for representing threshold functions in three different regimes: probabilistic polynomials of low degree, which need far less randomness than previous constructions, polynomial threshold functions (PTFs) with "nice" threshold behavior and degree almost as low as the probabilistic polynomials, and a new notion of probabilis...
Article
Full-text available
In this paper, we undertake a systematic study of recurrences x_{m+n}x_{m} = P(x_{m+1}, ..., x_{m+n-1}) which exhibit the Laurent phenomenon. Some of the most famous among these sequences come from the Somos and the Gale-Robinson recurrences. Our approach is based on finding period 1 seeds of Laurent phenomenon algebras of Lam-Pylyavskyy. We comple...
Article
Full-text available
We show how to compute any symmetric Boolean function on $n$ variables over any field (as well as the integers) with a probabilistic polynomial of degree $O(\sqrt{n \log(1/\epsilon)})$ and error at most $\epsilon$. The degree dependence on $n$ and $\epsilon$ is optimal, matching a lower bound of Razborov (1987) and Smolensky (1987) for the MAJORITY...
Article
Following de Verdière–Gitler–Vertigan and Curtis–Ingerman–Morrow, we prove a host of new results on circular planar electrical networks. We first construct a poset of electrical networks with n boundary vertices, and prove that it is graded by number of edges of critical representatives. We then answer various enumerative questions related to , ada...
Article
Full-text available
Curtis-Ingerman-Morrow characterize response matrices for circular planar electrical networks as symmetric square matrices with row sums zero and non-negative circular minors. In this paper, we study this positivity phenomenon more closely, from both algebraic and combinatorial perspectives. Extending work of Postnikov, we introduce electrical posi...
Article
Full-text available
Following de Verdi\`{e}re-Gitler-Vertigan and Curtis-Ingerman-Morrow, we prove a host of new results on circular planar electrical networks. We introduce a poset EP_{n} of electrical networks with n boundary vertices, giving two equivalent characterizations, one combinatorial and the other topological. We then investigate various properties of the...

Network

Cited By