Laurent El Ghaoui’s research while affiliated with University of California, Berkeley and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (166)


Figure 1: Time series of a 21-day rolling average of AMC stock volatility plotted on a log scale, highlights a drastic volatility increase at the beginning of our validation cutoff.
Figure 2: Left: Geometric visualization of one set of training features (xi, yi, zi, pi, θi) and its corresponding labels (X, Y , Z, T ). The triangles correspond to stations and the star corresponds to a source. Right: The map shows the training set region colored in blue, roughly corresponding to the Pacific Ring of Fire. The two red areas are the testing set regions for k = 3.
Figure 3: Test MSE for the identity function task. MSE for MLP and Transformers model increases as the distribution shift hyperparameter κ increases.
Figure 4: Test Log(MSE) for the arithmetic operations. The implicit model strongly outperforms all other models on OOD data.
Figure 5: For rolling tasks, implicit models maintain close to constant loss (↓) and accuracy (↑) across shifts.

+6

The Extrapolation Power of Implicit Models
  • Preprint
  • File available

July 2024

·

13 Reads

Juliette Decugis

·

Alicia Y. Tsai

·

Max Emerling

·

[...]

·

Laurent El Ghaoui

In this paper, we investigate the extrapolation capabilities of implicit deep learning models in handling unobserved data, where traditional deep neural networks may falter. Implicit models, distinguished by their adaptability in layer depth and incorporation of feedback within their computational graph, are put to the test across various extrapolation scenarios: out-of-distribution, geographical, and temporal shifts. Our experiments consistently demonstrate significant performance advantage with implicit models. Unlike their non-implicit counterparts, which often rely on meticulous architectural design for each task, implicit models demonstrate the ability to learn complex model structures without the need for task-specific design, highlighting their robustness in handling unseen data.

Download



Data-Driven Reachability and Support Estimation With Christoffel Functions

September 2023

·

34 Reads

·

4 Citations

IEEE Transactions on Automatic Control

We present algorithms for estimating the forward reachable set of a dynamical system using only a finite collection of independent and identically distributed samples. The produced estimate is the sublevel set of a function called an empirical inverse Christoffel function: empirical inverse Christoffel functions are known to provide good approximations to the support of probability distributions. In addition to reachability analysis, the same approach can be applied to general problems of estimating the support of a random variable, which has applications in data science towards detection of novelties and outliers in data sets. In applications where safety is a concern, having a guarantee of accuracy that holds on finite data sets is critical. In this paper, we prove such bounds for our algorithms under the Probably Approximately Correct (PAC) framework. In addition to applying classical Vapnik- Chervonenkis (VC) dimension bound arguments, we apply the PAC-Bayes theorem by leveraging a formal connection between kernelized empirical inverse Christoffel functions and Gaussian process regression models.


Naive Feature Selection: A Nearly Tight Convex Relaxation for Sparse Naive Bayes

May 2023

·

39 Reads

·

4 Citations

Mathematics of Operations Research

Because of its linear complexity, naive Bayes classification remains an attractive supervised learning method, especially in very large-scale settings. We propose a sparse version of naive Bayes, which can be used for feature selection. This leads to a combinatorial maximum-likelihood problem, for which we provide an exact solution in the case of binary data, or a bound in the multinomial case. We prove that our convex relaxation bounds become tight as the marginal contribution of additional features decreases using a priori duality gap bounds derived from the Shapley–Folkman theorem. We show how to produce primal solutions satisfying these bounds. Both binary and multinomial sparse models are solvable in time almost linear in problem size, representing a very small extra relative cost compared with the classical naive Bayes. Numerical experiments on text data show that the naive Bayes feature selection method is as statistically effective as state-of-the-art feature selection methods such as recursive feature elimination, l 1 -penalized logistic regression, and LASSO while being orders of magnitude faster (a python implementation can be found at https://github.com/aspremon/NaiveFeatureSelection ). Funding: A. d’Aspremont acknowledges support from the Fonds AXA Pour la Recherche and Kamet Ventures [Machine Learning and Optimisation Joint Research Initiative] and a Google-focused award as well as funding from Agence Nationale de la Recherche [Grant ANR-19-P3IA-0001]. L. El Ghaoui acknowledges support from Berkeley Artificial Intelligence Research and the Tsinghua–Berkeley–Shenzhen Institute.



State-driven Implicit Modeling for Sparsity and Robustness in Neural Networks

September 2022

·

70 Reads

Implicit models are a general class of learning models that forgo the hierarchical layer structure typical in neural networks and instead define the internal states based on an ``equilibrium'' equation, offering competitive performance and reduced memory consumption. However, training such models usually relies on expensive implicit differentiation for backward propagation. In this work, we present a new approach to training implicit models, called State-driven Implicit Modeling (SIM), where we constrain the internal states and outputs to match that of a baseline model, circumventing costly backward computations. The training problem becomes convex by construction and can be solved in a parallel fashion, thanks to its decomposable structure. We demonstrate how the SIM approach can be applied to significantly improve sparsity (parameter reduction) and robustness of baseline models trained on FashionMNIST and CIFAR-100 datasets.


Figure 1: Algorithm run-time for FWSum-BM25, TextRank and SSC on the FINANCIAL OUTLOOK data. The x-axis shows the length of the generated summary (i.e. k) as a percentage of the source document length (number of sentences in the source document).
Lexical and semantic ROUGE performance for FINANCIAL OUTLOOK and CLASSICAL LITERATURE data. Results that are statistically better are bold faced and results that are statistically indistinguishable are colored as gray. An additional experimental results can be found in appendix A.
Sparse Optimization for Unsupervised Extractive Summarization of Long Documents with the Frank-Wolfe Algorithm

August 2022

·

32 Reads

We address the problem of unsupervised extractive document summarization, especially for long documents. We model the unsupervised problem as a sparse auto-regression one and approximate the resulting combinatorial problem via a convex, norm-constrained problem. We solve it using a dedicated Frank-Wolfe algorithm. To generate a summary with k sentences, the algorithm only needs to execute k\approx k iterations, making it very efficient. We explain how to avoid explicit calculation of the full gradient and how to include sentence embedding information. We evaluate our approach against two other unsupervised methods using both lexical (standard) ROUGE scores, as well as semantic (embedding-based) ones. Our method achieves better results with both datasets and works especially well when combined with embeddings for highly paraphrased summaries.


Recurrent Neural Network Controllers Synthesis with Stability Guarantees for Partially Observed Systems

June 2022

·

40 Reads

·

18 Citations

Proceedings of the AAAI Conference on Artificial Intelligence

Neural network controllers have become popular in control tasks thanks to their flexibility and expressivity. Stability is a crucial property for safety-critical dynamical systems, while stabilization of partially observed systems, in many cases, requires controllers to retain and process long-term memories of the past. We consider the important class of recurrent neural networks (RNN) as dynamic controllers for nonlinear uncertain partially-observed systems, and derive convex stability conditions based on integral quadratic constraints, S-lemma and sequential convexification. To ensure stability during the learning and control process, we propose a projected policy gradient method that iteratively enforces the stability conditions in the reparametrized space taking advantage of mild additional information on system dynamics. Numerical experiments show that our method learns stabilizing controllers with fewer samples and achieves higher final performance compared with policy gradient.



Citations (73)


... However, when the number of layers of the RNN increases, or the value of w is not selected correctly, and the loss function is not properly selected, the RNN is prone to gradient vanishing and gradient explosion [10]. Its network structure is shown in Figure 2. The final output of this network is y = mh (6) At this point, we define the error function as [11]. ...

Reference:

Research on Thickness Error Optimization Method of Rolling System Based on Improved Sparrow Search Algorithm–Bidirectional Long Short-Term Memory Network–Attention
Learnable features for predicting properties of metal-organic frameworks with deep neural networks
  • Citing Article
  • July 2024

Cell Reports Physical Science

... Similarly, in traditional control applications, learned models are incorporated into modern control synthesis techniques, and standard methods of proving safety are adjusted to incorporate learned models. A learning-based approach that has proved to mesh particularly well with this approach uses Gaussian process models (Akametalu et al., 2014;Umlauft et al., 2017;Wang et al., 2018;Devonport et al., , 2021b. Another point of contact is the use of statistical guarantees for controllers based on learned models. ...

Data-Driven Reachability and Support Estimation With Christoffel Functions
  • Citing Article
  • September 2023

IEEE Transactions on Automatic Control

... Supervised learning (SL) is an algorithm that learns from massive, labeled datasets and generates prediction models that can work to generate labels for new datasets. SL includes support vector machine (SVM) [35], multilayer perceptron (MLP) [36], linear regression [37,38], linear discriminant analysis [39,40], K-nearest neighbor [41,42], decision tree [43,44], and naïve Bayes [45,46]. In this work, we demonstrate that SVM and MLP can be used for processing of the photonic biosensor signal and dataset. ...

Naive Feature Selection: A Nearly Tight Convex Relaxation for Sparse Naive Bayes
  • Citing Article
  • May 2023

Mathematics of Operations Research

... The field of deep learning, including domains such as computer vision [1], [2], natural language processing [3], deep reinforcement learning [4], and robotics [5], has yielded revolutionary results when trained with variants of gradient descent such as stochastic gradient descent (SGD) [6] and Adam [7]. Algorithms like projected and conditional gradient descent extend the class of first-order methods to accommodate problems with constraints such as matrix completion, or training well-posed implicit deep models [8], [9]. ...

A Sequential Greedy Approach for Training Implicit Deep Models
  • Citing Conference Paper
  • December 2022

... (13) Given a prior P and a dataset S, (10) and (13) completely specify a posterior distribution, requiring no additional free parameters. The resulting distribution achieves a balance between fitting the data and incorporating prior knowledge. ...

Recurrent Neural Network Controllers Synthesis with Stability Guarantees for Partially Observed Systems
  • Citing Article
  • June 2022

Proceedings of the AAAI Conference on Artificial Intelligence

... Barvinok (1995), Deza et al. (1997), Pataki (1998) To the best of our knowledge, this is the first work to study the rank bounds for the generic partial convexification-LSOP-R. It is worth mentioning that our rank bounds recover all the ones reviewed here for QCQP and Fair PCA from a different perspective, and successfully reduce the sparsity bound of Askari et al. (2022) when applying to Sparse Ridge Regression. ...

Approximation Bounds for Sparse Programs
  • Citing Article
  • June 2022

SIAM Journal on Mathematics of Data Science

... Thus, data-driven approaches are rapidly becoming an important research focus in the control community. Recent research [31], [32] emphasizes characterizing reachable sets in a probabilistic sense to account for noise in data. Other studies can be viewed as extensions of modelbased set-propagation approaches. ...

Data-Driven Reachability Analysis with Christoffel Functions
  • Citing Conference Paper
  • December 2021

... These learning models substitute a rule of composition, which may be a fixed-point scheme or a differential equation solution for the concept of layers. Known deep learning frameworks that use implicit infinite-depth layers are neural ODEs [25], implicit deep learning [26] and deep equilibrium networks [27]. In [28], the convergence of specific classes of implicit networks to global minima is examined. ...

Implicit Deep Learning
  • Citing Article
  • September 2021

SIAM Journal on Mathematics of Data Science

... Recently, Romano et al. (2020) presented a deep machine to extend the Model-X knockoff method to a vast range of problems, and the idea of sampling knockoff copies by matching higher-order moments is a natural extension of the existing second-order approximation. Askari et al. (2021) described a series of algorithms that efficiently implement Gaussian Model-X knockoffs to control the false discovery rate on largescale feature selection problems. Nevertheless, all of these methods are based on full data. ...

FANOK: Knockoffs in Linear Time
  • Citing Article
  • July 2021

SIAM Journal on Mathematics of Data Science

... However, although HJ reachability offers safety guarantees for general dynamical systems, its computation scales exponentially with the number of states in the systems, limiting reachabilty to be directly applicable to systems with only two vehicles [18], [17]. While attempts have been made to use reachability-based methods to guarantee safety for a larger number of vehicles [19], [20], [21], these works either make strong assumptions on the formation of the vehicles or require that the vehicles know other vehicles' trajectories a priori. In contrast, in this paper, we tackle unstructured collision avoidance where unstructuredness refers to the scenario that vehicles do not have to follow specific structures and formation or require knowledge of future trajectories of other agents. ...

Reachability-based Safe Planning for Multi-Vehicle Systems with Multiple Targets
  • Citing Conference Paper
  • May 2021