Jean-Michel A. Sarr’s research while affiliated with Cheikh Anta Diop University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (3)


Learning Approximate Invariance Requires Far Fewer Data
  • Chapter

February 2023

·

6 Reads

Lecture Notes of the Institute for Computer Sciences

Jean-Michel A. Sarr

·

·

Christophe Cambier

Efficient learning, that is, learning with small datasets is difficult for current deep learning models. Invariance has been conjectured to be the key for its generalization potential. One of the most used procedure to learn invariant models is data augmentation (DA). Data augmentation can be performed offline by augmenting the data before any training, or it can be performed online during training. However, applying those technique won’t yield better generalization gains every time. We frame this problem as the stability of generalization gains made by invariance inducing techniques. In this study we introduced a new algorithm to train an approximate invariant priors before posterior training of Bayesian Neural Network (BNN). Furthermore, we compared the generalization stability of our invariance inducing algorithm with online DA and offline DA on MNIST and Fashion MNIST with three perturbation processes: rotation, noise, and rotation+noise. Results showed that learning approximate invariant priors requires less exposure to the perturbation process, but it leads BNN to more stable generalization gains during posterior training. Finally, we also show that invariance inducing techniques enhance uncertainty in Bayesian Neural Networks.


Figure 2: Echogram extracted from the acoustic sea survey used in the study. (a) An example in which the automatic procedure for bottom detection (green line) has failed around ping 2000. (b) Random samples from the 2011 and (c) 2015 echograms, with the same ping size and cell number; showing differences with respect to the settings for NaNs ("not a number", strong blue color) below the bottom. Unit: see color bar in panel (a). . (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Figure 3: Sampling methodology. Subfigure (a) summarizes the data sampling strategy. Subfigure (b) shows the training and test set sizes as described in experiment 2.2.1. Subfigure (c) shows the setting proposed for experiment 2.2.3. The blue color corresponds to data from 2011 and the orange color corresponds to data from 2015. In (b) and (c), the longer boxes correspond to the training dataset. In (b) the small boxes are used as a test set and in (c) as a validation set.
Figure 4: Illustration of the hyperparameters selection with Bayesian optimization procedure for Random Forests (RF), Support Vector Machines (SVM), Feed-Forward Neural Networks (FFNN) and Convolutional Neural Networks (CNN).
Figure 5: (a) Mean accuracy got for Random Forests (RF), Support Vector Machines (SVM), Feed-Forward Neural Networks (FFNN) and Convolutional Neural Networks (CNN) while varying the training dataset size from 200 000 to 1000 000 pings from the 2011 sea survey dataset. (b) Test accuracy summary statistics of each learning algorithm after 5 repetitions of the training process.
Figure 6: Performance at each iteration (epoch) of the model during training and validation on: simple training (ST) and cross-domain training (CDT) for 100,000, 300,000, and 550,000 pings. (a) training accuracy, (b) validation accuracy, (c) training losses and (d) validation losses. Validation accuracy and losses were obtained during training on a single unseen validation set from the 2015 sea survey dataset.

+4

Complex data labeling with deep learning methods: Lessons from fisheries acoustics
  • Preprint
  • File available

October 2020

·

298 Reads

J. M. A. Sarr

·

·

·

[...]

·

S. El Ayoub

Quantitative and qualitative analysis of acoustic backscattered signals from the seabed bottom to the sea surface is used worldwide for fish stocks assessment and marine ecosystem monitoring. Huge amounts of raw data are collected yet require tedious expert labeling. This paper focuses on a case study where the ground truth labels are non-obvious: echograms labeling, which is time-consuming and critical for the quality of fisheries and ecological analysis. We investigate how these tasks can benefit from supervised learning algorithms and demonstrate that convolutional neural networks trained with non-stationary datasets can be used to stress parts of a new dataset needing human expert correction. Further development of this approach paves the way toward a standardization of the labeling process in fisheries acoustics and is a good case study for non-obvious data labeling processes.

Download

Complex data labeling with deep learning methods: Lessons from fisheries acoustics

October 2020

·

82 Reads

·

16 Citations

ISA Transactions

Quantitative and qualitative analysis of acoustic backscattered signals from the seabed bottom to the sea surface is used worldwide for fish stocks assessment and marine ecosystem monitoring. Huge amounts of raw data are collected yet require tedious expert labeling. This paper focuses on a case study where the ground truth labels are non-obvious: echograms labeling, which is time-consuming and critical for the quality of fisheries and ecological analysis. We investigate how these tasks can benefit from supervised learning algorithms and demonstrate that convolutional neural networks trained with non-stationary datasets can be used to stress parts of a new dataset needing human expert correction. Further development of this approach paves the way toward a standardization of the labeling process in fisheries acoustics and is a good case study for non-obvious data labeling processes.

Citations (1)


... More recently, ML methods were applied to acoustic data gathered by a commercial echosounder buoy to identify tropical tuna aggregations (Baidai et al. 2020). Meanwhile, hydroacoustic data analysis has been streamlined through the development of CNN to aid in the task of labeling data (Sarr et al. 2020). Underwater in situ species identification can be carried out-in a labor-intensive and expensive manner-by divers, with minimal impact on sensitive communities. ...

Reference:

Machine Learning Applications for Fisheries—At Scales from Genomics to Ecosystems
Complex data labeling with deep learning methods: Lessons from fisheries acoustics
  • Citing Article
  • October 2020

ISA Transactions