Daniel Mckenzie

Daniel Mckenzie
University of California, Los Angeles | UCLA · Department of Mathematics

PhD

About

26
Publications
2,941
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
120
Citations
Introduction
Education
August 2013 - May 2019
University of Georgia
Field of study
  • Mathematics

Publications

Publications (26)
Preprint
Full-text available
Recent work has suggested that certain neural network architectures-particularly recurrent neural networks (RNNs) and implicit neural networks (INNs) are capable of logical extrapolation. That is, one may train such a network on easy instances of a specific task and then apply it successfully to more difficult instances of the same task. In this pa...
Preprint
Full-text available
We analyze the convergence properties of Fermat distances, a family of density-driven metrics defined on Riemannian manifolds with an associated probability measure. Fermat distances may be defined either on discrete samples from the underlying measure, in which case they are random, or in the continuum setting, in which they are induced by geodesi...
Article
The standard simplex in $\mathbb{R}^{n}$, also known as the probability simplex, is the set of nonnegative vectors whose entries sum up to 1. It frequently appears as a constraint in optimization problems that arise in machine learning, statistics, data science, operations research and beyond. We convert the standard simplex to the unit sphere and...
Preprint
Full-text available
Although deep neural networks have achieved super-human performance on many classification tasks, they often exhibit a worrying lack of robustness towards adversarially generated examples. Thus, considerable effort has been invested into reformulating Empirical Risk Minimization (ERM) into an adversarially robust framework. Recently, attention has...
Preprint
Full-text available
In many practical settings, a combinatorial problem must be repeatedly solved with similar, but distinct parameters w. Yet, w is not directly observed; only contextual data d that correlates with w is available. It is tempting to use a neural network to predict w given d, but training such a model requires reconciling the discrete nature of combina...
Preprint
Full-text available
Comparison-Based Optimization (CBO) is an optimization paradigm that assumes only very limited access to the objective function f(x). Despite the growing relevance of CBO to real-world applications, this field has received little attention as compared to the adjacent field of Zeroth-Order Optimization (ZOO). In this work we propose a relatively sim...
Article
We study zeroth-order optimization for convex functions where we further assume that function evaluations are unavailable. Instead, one only has access to a comparison oracle, which given two points x and y returns a single bit of information indicating which point has larger function value, f(x) or f(y). By treating the gradient as an unknown sign...
Article
A promising trend in deep learning replaces traditional feedforward networks with implicit networks. Unlike traditional networks, implicit networks solve a fixed point equation to compute inferences. Solving for the fixed point varies in complexity, depending on provided data and an error tolerance. Importantly, implicit networks may be trained wit...
Preprint
Full-text available
We show how to convert the problem of minimizing a convex function over the standard probability simplex to that of minimizing a nonconvex function over the unit sphere. We prove the landscape of this nonconvex problem is benign, i.e. every stationary point is either a strict saddle or a global minimizer. We exploit the Riemannian manifold structur...
Preprint
We propose a new line-search method, coined Curvature-Aware Random Search (CARS), for derivative-free optimization. CARS exploits approximate curvature information to estimate the optimal step-size given a search direction. We prove that for strongly convex objective functions, CARS converges linearly if the search direction is drawn from a distrib...
Preprint
Full-text available
Systems of interacting agents can often be modeled as contextual games, where the context encodes additional information, beyond the control of any agent (e.g. weather for traffic and fiscal policy for market economies). In such systems, the most likely outcome is given by a Nash equilibrium. In many practical settings, only game equilibria are obs...
Preprint
Full-text available
A promising trend in deep learning replaces traditional feedforward networks with implicit networks. Unlike traditional networks, implicit networks solve a fixed point equation to compute inferences. Solving for the fixed point varies in complexity, depending on provided data and an error tolerance. Importantly, implicit networks may be trained wit...
Preprint
We consider the zeroth-order optimization problem in the huge-scale setting, where the dimension of the problem is so large that performing even basic vector operations on the decision variables is infeasible. In this paper, we propose a novel algorithm, coined ZO-BCD, that exhibits favorable overall query complexity and has a much smaller per-iter...
Preprint
Full-text available
New geometric and computational analyses of power-weighted shortest-path distances (PWSPDs) are presented. By illuminating the way these metrics balance density and geometry in the underlying data, we clarify their key parameters and discuss how they may be chosen in practice. Comparisons are made with related data-driven metrics, which illustrate...
Preprint
Full-text available
We present a preliminary study of a knowledge graph created from season one of the television show Veronica Mars, which follows the eponymous young private investigator as she attempts to solve the murder of her best friend Lilly Kane. We discuss various techniques for mining the knowledge graph for clues and potential suspects. We also discuss bes...
Preprint
We study derivative-free optimization for convex functions where we further assume that function evaluations are unavailable. Instead, one only has access to a comparison oracle, which, given two points $x$ and $y$, and returns a single bit of information indicating which point has larger function value, $f(x)$ or $f(y)$, with some probability of b...
Preprint
Full-text available
We consider the problem of minimizing a high-dimensional objective function, which may include a regularization term, using (possibly noisy) evaluations of the function. Such optimization is also called derivative-free, zeroth-order, or black-box optimization. We propose a new $\textbf{Z}$eroth-$\textbf{O}$rder $\textbf{R}$egularized $\textbf{O}$pt...
Preprint
We study the use of power weighted shortest path distance functions for clustering high dimensional Euclidean data, under the assumption that the data is drawn from a collection of disjoint low dimensional manifolds. We argue, theoretically and experimentally, that this leads to higher clustering accuracy. We also present a fast algorithm for compu...
Preprint
In this note we investigate under which conditions the dual of the flow polytope (henceforth referred to as the `dual flow polytope') of a quiver is k-neighborly, for generic weights near the canonical weight. We provide a lower bound on k, depending only on the edge connectivity of the underlying graph, for such weights. In the case where the cano...
Preprint
Full-text available
We use techniques from compressive sensing to design a local clustering algorithm by treating the cluster indicator vector as a sparse solution to a linear system whose coefficient matrix is the graph Laplacian. If the graph is drawn from the Stochastic Block Model we are able to prove that the fraction of misclassified vertices goes to zero as the...
Article
Full-text available
The community detection problem for graphs asks one to partition the n vertices V of a graph G into k communities, or clusters, such that there are many intracluster edges and few intercluster edges. Of course this is equivalent to finding a permutation matrix P such that, if A denotes the adjacency matrix of G, then PAP^T is approximately block di...

Network

Cited By