Maziar Sanjabi

Maziar Sanjabi
University of Minnesota Twin Cities | UMN · Department of Electrical and Computer Engineering

Doctor of Philosophy

About

50
Publications
11,259
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,119
Citations
Citations since 2017
37 Research Items
1792 Citations
20172018201920202021202220230100200300400500
20172018201920202021202220230100200300400500
20172018201920202021202220230100200300400500
20172018201920202021202220230100200300400500

Publications

Publications (50)
Preprint
Full-text available
Differential privacy (DP) is by far the most widely accepted framework for mitigating privacy risks in machine learning. However, exactly how small the privacy parameter $\epsilon$ needs to be to protect against certain privacy risks in practice is still not well-understood. In this work, we study data reconstruction attacks for discrete data and a...
Preprint
Full-text available
An oft-cited challenge of federated learning is the presence of heterogeneity. \emph{Data heterogeneity} refers to the fact that data from different clients may follow very different distributions. \emph{System heterogeneity} refers to the fact that client devices have different system capabilities. A considerable number of federated optimization m...
Preprint
Personalized text generation has broad industrial applications, such as explanation generation for recommendations, conversational systems, etc. Personalized text generators are usually trained on user written text, e.g., reviews collected on e-commerce platforms. However, due to historical, social, or behavioral reasons, there may exist bias that...
Preprint
Free-text rationales aim to explain neural language model (LM) behavior more flexibly and intuitively via natural language. To ensure rationale quality, it is important to have metrics for measuring rationales' faithfulness (reflects LM's actual behavior) and plausibility (convincing to humans). All existing free-text rationale metrics are based on...
Preprint
Full-text available
An oft-cited challenge of federated learning is the presence of data heterogeneity -- the data at different clients may follow very different distributions. Several federated optimization methods have been proposed to address these challenges. In the literature, empirical evaluations usually start federated training from a random initialization. Ho...
Preprint
Full-text available
Neural language models' (NLMs') reasoning processes are notoriously hard to explain. Recently, there has been much progress in automatically generating machine rationales of NLM behavior, but less in utilizing the rationales to improve NLM behavior. For the latter, explanation regularization (ER) aims to improve NLM generalization by pushing the ma...
Preprint
Full-text available
We propose an autoregressive entity linking model, that is trained with two auxiliary tasks, and learns to re-rank generated samples at inference time. Our proposed novelties address two weaknesses in the literature. First, a recent method proposes to learn mention detection and then entity candidate selection, but relies on predefined sets of cand...
Preprint
Full-text available
Self-supervised learning methods have shown impressive results in downstream classification tasks. However, there is limited work in understanding their failure models and interpreting the learned representations of these models. In this paper, we tackle these issues and study the representation space of self-supervised models by understanding the...
Preprint
While neural networks have shown remarkable success on classification tasks in terms of average-case performance, they often fail to perform well on certain groups of the data. Such group information may be expensive to obtain; thus, recent works in robustness and fairness have proposed ways to improve worst-group performance even when group labels...
Preprint
An extractive rationale explains a language model's (LM's) prediction on a given task instance by highlighting the text inputs that most influenced the output. Ideally, rationale extraction should be faithful (reflects LM's behavior), plausible (makes sense to humans), data-efficient, and fast, without sacrificing the LM's task performance. Prior r...
Preprint
Full-text available
Exponential tilting is a technique commonly used in fields such as statistics, probability, information theory, and optimization to create parametric distribution shifts. Despite its prevalence in related fields, tilting has not seen widespread use in machine learning. In this work, we aim to bridge this gap by exploring the use of tilting in risk...
Conference Paper
Full-text available
Empirical risk minimization (ERM) is typically designed to perform well on the average loss, which can result in estimators that are sensitive to outliers, generalize poorly, or treat subgroups unfairly. While many methods aim to address these problems individually, in this work, we explore them through a unified framework-tilted empirical risk min...
Preprint
Large neural networks are impractical to deploy on mobile devices due to their heavy computational cost and slow inference. Knowledge distillation (KD) is a technique to reduce the model size while retaining performance by transferring knowledge from a large "teacher" model to a smaller "student" model. However, KD on multimodal datasets such as vi...
Preprint
Quantization of the parameters of machine learning models, such as deep neural networks, requires solving constrained optimization problems, where the constraint set is formed by the Cartesian product of many simple discrete sets. For such optimization problems, we study the performance of the Alternating Direction Method of Multipliers for Quantiz...
Article
The min-max optimization problem, also known as the <;italic>saddle point problem<;/italic>, is a classical optimization problem that is also studied in the context of zero-sum games. Given a class of objective functions, the goal is to find a value for the argument that leads to a small objective value even for the worst-case function in the given...
Preprint
Full-text available
Empirical risk minimization (ERM) is typically designed to perform well on the average loss, which can result in estimators that are sensitive to outliers, generalize poorly, or treat subgroups unfairly. While many methods aim to address these problems individually, in this work, we explore them through a unified framework---tilted empirical risk m...
Preprint
The min-max optimization problem, also known as the saddle point problem, is a classical optimization problem which is also studied in the context of zero-sum games. Given a class of objective functions, the goal is to find a value for the argument which leads to a small objective value even for the worst case function in the given class. Min-max o...
Conference Paper
Full-text available
Federated learning involves training statistical models in massive, heterogeneous networks. Naively minimizing an aggregate loss function in such a network may disproportionately advantage or disadvantage some of the devices. In this work, we propose q-Fair Federated Learning (q-FFL), a novel optimization objective inspired by fair resource allocat...
Preprint
Full-text available
Federated learning aims to jointly learn statistical models over massively distributed remote devices. In this work, we propose FedDANE, an optimization method that we adapt from DANE, a method for classical distributed optimization, to handle the practical constraints of federated learning. We provide convergence guarantees for this method when le...
Preprint
We study the optimization problem for decomposing $d$ dimensional fourth-order Tensors with $k$ non-orthogonal components. We derive \textit{deterministic} conditions under which such a problem does not have spurious local minima. In particular, we show that if $\kappa = \frac{\lambda_{max}}{\lambda_{min}} < \frac{5}{4}$, and incoherence coefficien...
Article
Full-text available
Achieving highly detailed terrain models spanning vast areas is crucial to modern computer graphics. The pipeline for obtaining such terrains is via amplification of a low-resolution terrain to refine the details given a desired theme, which is a time-consuming and labor-intensive process. Recently, data-driven methods, such as the sparse construct...
Preprint
Full-text available
Achieving highly detailed terrain models spanning vast areas is crucial to modern computer graphics. The pipeline for obtaining such terrains is via amplification of a low-resolution terrain to refine the details given a desired theme, which is a time-consuming and labor-intensive process. Recently, data-driven methods, such as the sparse construct...
Preprint
Federated learning involves training statistical models in massive, heterogeneous networks. Naively minimizing an aggregate loss function in such a network may disproportionately advantage or disadvantage some of the devices. In this work, we propose q-Fair Federated Learning (q-FFL), a novel optimization objective inspired by resource allocation i...
Preprint
In recent years, Generative Adversarial Networks (GANs) have drawn a lot of attentions for learning the underlying distribution of data in various applications. Despite their wide applicability, training GANs is notoriously difficult. This difficulty is due to the min-max nature of the resulting optimization problem and the lack of proper tools of...
Preprint
Recent applications that arise in machine learning have surged significant interest in solving min-max saddle point games. This problem has been extensively studied in the convex-concave regime for which a global equilibrium solution can be computed efficiently. In this paper, we study the problem in the non-convex regime and show that an \varepsil...
Preprint
Full-text available
Federated learning involves training machine learning models in massively distributed networks. While Federated Averaging (FedAvg) is the leading optimization method for training non-convex models in this setting, its behavior is not well understood in realistic federated settings when learning across statistically heterogeneous devices, i.e., wher...
Preprint
Full-text available
The burgeoning field of federated learning involves training machine learning models in massively distributed networks, and requires the development of novel distributed optimization techniques. Federated averaging~(\fedavg) is the leading optimization method for training non-convex models in this setting, exhibiting impressive empirical performanc...
Preprint
In this short note, we consider the problem of solving a min-max zero-sum game. This problem has been extensively studied in the convex-concave regime where the global solution can be computed efficiently. Recently, there have also been developments for finding the first order stationary points of the game when one of the player's objective is conc...
Article
Generative Adversarial Networks (GANs) are one of the most practical methods for learning data distributions. A popular GAN formulation is based on the use of Wasserstein distance as a metric between probability distributions. Unfortunately, minimizing the Wasserstein distance between the data distribution and the generative model distribution is a...
Article
Federated learning poses new statistical and systems challenges in training machine learning models over distributed networks of devices. In this work, we show that multi-task learning is naturally suited to handle the statistical challenges of this setting, and propose a novel systems-aware optimization method, MOCHA, that is robust to practical s...
Conference Paper
Recent years have seen a revival of interest in the Alternating Direction Method of Multipliers (ADMM), due to its simplicity, versatility, and scalability. As a first order method for general convex problems, the rate of convergence of ADMM is O(1=k) [4, 25]. Given the scale of modern data mining problems, an algorithm with similar properties as A...
Article
To cope with the growing demand for wireless data and to extend service coverage, future fifth-generation (5G) networks will increasingly rely on the use of low-power nodes to support massive connectivity in a diverse set of applications and services. To this end, virtualized and mass-scale cloud architectures are proposed as promising technologies...
Article
In this paper we consider the problem of partial coordinated transmission in the downlink of a wireless heterogeneous network (HetNet). The partial coordination is crucial in trading off system performance with backhaul overhead, and it is achieved through jointly optimizing the base station (BS) clustering and the downlink beamformers. Unlike many...
Article
Full-text available
To cope with the growing demand for wireless data and to extend service coverage, future 5G networks will increasingly rely on the use of low powered nodes to support massive connectivity in diverse set of applications and services [1]. To this end, virtualized and mass-scale cloud architectures are proposed as promising technologies for 5G in whic...
Article
Consider a downlink MIMO heterogeneous wireless network with multiple cells, each containing many mobile users and a number of base stations with varying capabilities. A central task in the management of such a network is to assign each user to a base station and design a linear transmit strategy to ensure a satisfactory level of network performanc...
Article
Consider the problem of minimizing the sum of two convex functions, one being smooth and the other non-smooth. In this paper, we introduce a general class of approximate proximal splitting (APS) methods for solving such minimization problems. Methods in the APS class include many well-known algorithms such as the proximal splitting method (PSM), th...
Article
Consider the problem of minimizing the expected value of a cost function parameterized by a random variable. The classical sample average approximation (SAA) method for solving this problem requires minimization of an ensemble average of the objective at each step, which can be expensive. In this paper, we propose a stochastic successive upper-boun...
Conference Paper
Consider a multiple input-multiple output (MIMO) interference channel with partial channel state information (CSI) whereby the CSI is known only for some (or none) of the links, while the statistical knowledge is known for the remaining links. In this work, we consider the linear transceiver design problem for such an interference channel with part...
Conference Paper
Consider a MIMO heterogeneous network with multiple transmitters (including macro, pico and femto base stations) and many receivers (mobile users). The users are to be assigned to the base stations which then optimize their linear transmit beamformers accordingly. In this work, we consider the problem of joint base station assignment and linear bea...
Conference Paper
Full-text available
We consider the multiuser beamforming problem for a multi-input single-output downlink channel that takes into account the errors in the channel state information at the transmitter side (CSIT). By modeling the CSIT errors as elliptically bounded uncertainty regions, this problem can be formulated as minimizing the transmission power subject to the...
Article
Consider a MIMO interference channel whereby each transmitter and receiver are equipped with multiple antennas. The basic problem is to design optimal linear transceivers (or beamformers) that can maximize system throughput. The recent work [1] suggests that optimal beamformers should maximize the total degrees of freedom and achieve interference a...
Conference Paper
Consider a MIMO interference channel whereby each transmitter and receiver are equipped with multiple antennas. The basic problem is to design optimal linear transceivers (or beamformers) that can maximize system throughput. The recent work [13] suggests that optimal beamformers should maximize the total degrees of freedom and achieve interference...
Conference Paper
Full-text available
The problem of surveillance for intrusion detection in a camera sensor network is addressed in this paper. In order to save limited resources, a sensing task should involve just the right number of sensors. For a wide enough coverage area random and uniform distribution can be applied. We propose a novel method which allows reduction of number of s...

Network

Cited By