Cédric Archambeau’s research while affiliated with Humboldt-Universität zu Berlin and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (105)


Hyperparameter Optimization in Machine Learning
  • Preprint

October 2024

·

92 Reads

·

·

Valerio Perrone

·

[...]

·

Paolo Frasconi

Hyperparameters are configuration variables controlling the behavior of machine learning algorithms. They are ubiquitous in machine learning and artificial intelligence and the choice of their values determine the effectiveness of systems based on these technologies. Manual hyperparameter search is often unsatisfactory and becomes unfeasible when the number of hyperparameters is large. Automating the search is an important step towards automating machine learning, freeing researchers and practitioners alike from the burden of finding a good set of hyperparameters by trial and error. In this survey, we present a unified treatment of hyperparameter optimization, providing the reader with examples and insights into the state-of-the-art. We cover the main families of techniques to automate hyperparameter search, often referred to as hyperparameter optimization or tuning, including random and quasi-random search, bandit-, model- and gradient- based approaches. We further discuss extensions, including online, constrained, and multi-objective formulations, touch upon connections with other fields such as meta-learning and neural architecture search, and conclude with open questions and future research directions.


Explaining Multiclass Classifiers with Categorical Values: A Case Study in Radiography

July 2023

·

8 Reads

·

1 Citation

Lecture Notes in Computer Science

Explainability of machine learning methods is of fundamental importance in healthcare to calibrate trust. A large branch of explainable machine learning uses tools linked to the Shapley value, which have nonetheless been found difficult to interpret and potentially misleading. Taking multiclass classification as a reference task, we argue that a critical issue in these methods is that they disregard the structure of the model outputs. We develop the Categorical Shapley value as a theoretically-grounded method to explain the output of multiclass classifiers, in terms of transition (or flipping) probabilities across classes. We demonstrate on a case study composed of three example scenarios for pneumonia detection and subtyping using X-ray images.


RMSE, Calibration error and runtime for different surrogates when increasing the number of samples.
Tabulated benchmark statistics
Optimizing Hyperparameters with Conformal Quantile Regression
  • Preprint
  • File available

May 2023

·

55 Reads

Many state-of-the-art hyperparameter optimization (HPO) algorithms rely on model-based optimizers that learn surrogate models of the target function to guide the search. Gaussian processes are the de facto surrogate model due to their ability to capture uncertainty but they make strong assumptions about the observation noise, which might not be warranted in practice. In this work, we propose to leverage conformalized quantile regression which makes minimal assumptions about the observation noise and, as a result, models the target function in a more realistic and robust fashion which translates to quicker HPO convergence on empirical benchmarks. To apply our method in a multi-fidelity setting, we propose a simple, yet effective, technique that aggregates observed results across different resource levels and outperforms conventional methods across many empirical tasks.

Download

Renate: A Library for Real-World Continual Learning

April 2023

·

14 Reads

Continual learning enables the incremental training of machine learning models on non-stationary data streams.While academic interest in the topic is high, there is little indication of the use of state-of-the-art continual learning algorithms in practical machine learning deployment. This paper presents Renate, a continual learning library designed to build real-world updating pipelines for PyTorch models. We discuss requirements for the use of continual learning algorithms in practice, from which we derive design principles for Renate. We give a high-level description of the library components and interfaces. Finally, we showcase the strengths of the library by presenting experimental results. Renate may be found at https://github.com/awslabs/renate.


Fortuna: A Library for Uncertainty Quantification in Deep Learning

February 2023

·

88 Reads

We present Fortuna, an open-source library for uncertainty quantification in deep learning. Fortuna supports a range of calibration techniques, such as conformal prediction that can be applied to any trained neural network to generate reliable uncertainty estimates, and scalable Bayesian inference methods that can be applied to Flax-based deep neural networks trained from scratch for improved uncertainty quantification and accuracy. By providing a coherent framework for advanced uncertainty quantification methods, Fortuna simplifies the process of benchmarking and helps practitioners build robust AI systems.



Figure 2: Epistemic uncertainty estimations for two-moon dataset. The left and right columns respectively show estimates produced via ADVI before and after DAP calibration. We can see that DAP calibration helps counteracting ADVI's overconfidence OOD.
Figure 5: Distances from the training set, for inputs correctly and wrongly classified. The plots confirm the intuition that wrongly classified inputs tend to be further from the training domain.
Results on Fashion-MNIST
Results on CIFAR-10
Uncertainty Calibration in Bayesian Neural Networks via Distance-Aware Priors

July 2022

·

80 Reads

As we move away from the data, the predictive uncertainty should increase, since a great variety of explanations are consistent with the little available information. We introduce Distance-Aware Prior (DAP) calibration, a method to correct overconfidence of Bayesian deep learning models outside of the training domain. We define DAPs as prior distributions over the model parameters that depend on the inputs through a measure of their distance from the training set. DAP calibration is agnostic to the posterior inference method, and it can be performed as a post-processing step. We demonstrate its effectiveness against several baselines in a variety of classification and regression problems, including benchmarks designed to test the quality of predictive distributions away from the data.


PASHA: Efficient HPO with Progressive Resource Allocation

July 2022

·

10 Reads

Hyperparameter optimization (HPO) and neural architecture search (NAS) are methods of choice to obtain the best-in-class machine learning models, but in practice they can be costly to run. When models are trained on large datasets, tuning them with HPO or NAS rapidly becomes prohibitively expensive for practitioners, even when efficient multi-fidelity methods are employed. We propose an approach to tackle the challenge of tuning machine learning models trained on large datasets with limited computational resources. Our approach, named PASHA, is able to dynamically allocate maximum resources for the tuning procedure depending on the need. The experimental comparison shows that PASHA identifies well-performing hyperparameter configurations and architectures while consuming significantly fewer computational resources than solutions like ASHA.


Continual Learning with Transformers for Image Classification

June 2022

·

16 Reads

In many real-world scenarios, data to train machine learning models become available over time. However, neural network models struggle to continually learn new concepts without forgetting what has been learnt in the past. This phenomenon is known as catastrophic forgetting and it is often difficult to prevent due to practical constraints, such as the amount of data that can be stored or the limited computation sources that can be used. Moreover, training large neural networks, such as Transformers, from scratch is very costly and requires a vast amount of training data, which might not be available in the application domain of interest. A recent trend indicates that dynamic architectures based on an expansion of the parameters can reduce catastrophic forgetting efficiently in continual learning, but this needs complex tuning to balance the growing number of parameters and barely share any information across tasks. As a result, they struggle to scale to a large number of tasks without significant overhead. In this paper, we validate in the computer vision domain a recent solution called Adaptive Distillation of Adapters (ADA), which is developed to perform continual learning using pre-trained Transformers and Adapters on text classification tasks. We empirically demonstrate on different classification tasks that this method maintains a good predictive performance without retraining the model or increasing the number of model parameters over the time. Besides it is significantly faster at inference time compared to the state-of-the-art methods.



Citations (65)


... According to the authors, this approach scales linearly concerning both the corpus size and the number of features. They also reference other researchers (e.g., [9][10][11]) who suggest that this optimization method provides a robust alternative to sequential updates, which can sometimes cause model drift and lead to suboptimal confidence estimates. ...

Reference:

Evaluating the Societal Impact of AI: A Comparative Analysis of Human and AI Platforms Using the Analytic Hierarchy Process
Structured Penalties for Log-Linear Language Models
  • Citing Conference Paper
  • January 2013

... On one side, we have our "recipe for success" learned over tens of thousands of years, where our collective cultural brains lead to innovation and evolution. On the other side, we have these very large AI models which, despite encompassing enormous bodies of information, act as a single "super-human" that homogenizes and erases entire bodies of cultural knowledge (Schwöbel et al. 2023;McVeety 2024;Byrd 2023;Perez et al. 2024;Naous et al. 2024). ...

Geographical Erasure in Language Generation
  • Citing Conference Paper
  • January 2023

... A recent work presented by Franceschi et al. [14] (later extended [15]) introduces stochastic characteristic functions to deal with models that output a random variable. With a categorical random variable, their approach can be used for explaining a multiclass classifier by allowing probabilistic statements about the likelihood of a feature to flip the decision from one class to another. ...

Explaining Multiclass Classifiers with Categorical Values: A Case Study in Radiography
  • Citing Chapter
  • July 2023

Lecture Notes in Computer Science

... 3.2). Fortunately, the use of pre-trained large models can effectively mitigate catastrophic forgetting at the parameter level, as they have sufficient capacity to extract features without changing internal parameters [10][11][12]. However, frozen pre-trained models often perform poorly in downstream tasks, making them unsuitable for direct application [13]. ...

Continual Learning with Transformers for Image Classification
  • Citing Conference Paper
  • June 2022

... In their work, D. Sulem et al. [5] proposed a new method that explains the anomalies found in time series by generating counterfactual explanations. Counterfactual explanations are alternative scenarios that show how the data must change to remove an anomalous observation. ...

Diverse Counterfactual Explanations for Anomaly Detection in Time Series

... Here, we focus on global reconstructions of BERT's predictions for token-level classifications in this work, since this constitutes popular application scenarios of BERT (e.g., AS1, AS3) and since BERT also establishes text representations based on tokens. Moreover, as Zafar et al. (2021) and Yan et al. (2022) indicate, a reconstruction approach for token-level classifications can also serve as a basis for reconstructions of coarser classification tasks, for instance, for sentence-level classifications (e.g., AS2, AS4). ...

More Than Words: Towards Better Quality Interpretations of Text Classifiers

... Baselines: We utilize baselines derived from the architectures of popular FL frameworks (Qi et al., 2024;He et al., 2020;IBM, 2020;Beutel et al., 2020), as depicted in Figure 3. Specifically, we deploy the cloud aggregator server on the ml.m5.4xlarge instance of AWS SageMaker (Amazon Web Services, Inc., 2024b), a widely-used AWS service for managing non-training workloads such as inference and debugging (Liberty et al., 2020;Perrone et al., 2021;Das et al., 2020). AWS SageMaker connects with data storage options such as AWS S3 (Amazon Web Services, 2024b) for cloud object storage or AWS ElastiCache (Amazon Web Services, 2024a) for in-memory caching. ...

Amazon SageMaker Automatic Model Tuning: Scalable Gradient-Free Optimization
  • Citing Conference Paper
  • August 2021

... Recent multi-objective Bayesian optimization (MOBO) methods [19,57] have demonstrated competitive performance in incorporating fairness into the optimization process compared to other multi-objective hyperparameter optimization (HPO) techniques. Another advantage of BO is the availability of robust and efficient software frameworks [1,6,46], which provide standardized APIs to interact with different optimizers. ...

Fair Bayesian Optimization
  • Citing Conference Paper
  • July 2021

... The stopping criterion for Bayesian optimization leading to the effective finding of the optimal solution is controversial. 53,54 Concerning these studies, introducing the stopping criterion for the FMQA algorithm is interesting, and we expect to be able to terminate the optimization tasks when an optimal solution is found. ■ ASSOCIATED CONTENT ...

Overfitting in Bayesian Optimization: an empirical study and early-stopping solution