Titouan Parcollet

Titouan Parcollet
Université d´Avignon et des Pays du Vaucluse | UAPV · Laboratoire Informatique d'Avignon (EA 4128)

Doctor of Computer Science

About

65
Publications
21,427
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
670
Citations
Citations since 2016
65 Research Items
669 Citations
2016201720182019202020212022050100150200
2016201720182019202020212022050100150200
2016201720182019202020212022050100150200
2016201720182019202020212022050100150200
Introduction
I'm a senior research associate from the Oxford Machine Learning Systems (OxMLSys) lab interested in representation learning, new ways of considering artificial neurons and natural language understanding. I am also particularly attached to the concept of science for the social good. I have also been involved in various teaching project for more than seven years now.

Publications

Publications (65)
Preprint
Self-supervised learning (SSL) has proven vital in speech and audio-related applications. The paradigm trains a general model on unlabeled data that can later be used to solve specific downstream tasks. This type of model is costly to train as it requires manipulating long input sequences that can only be handled by powerful centralised servers. Su...
Preprint
Full-text available
When the available hardware cannot meet the memory and compute requirements to efficiently train high performing machine learning models, a compromise in either the training quality or the model complexity is needed. In Federated Learning (FL), nodes are orders of magnitude more constrained than traditional server-grade hardware and are often batte...
Conference Paper
Full-text available
L'apprentissage auto-supervisé a ouvert des perspectives prometteuses dans de nombreux domaines comme la vision par ordinateur, le traitement automatique de la langue ou celui de la parole. Les modèles pré-appris sur de grandes quantités de données non étiquetées peuvent être ajustés sur de petits ensembles de données transcrites manuellement. Ceux...
Conference Paper
Full-text available
L'apprentissage autosupervisé a apporté des améliorations remarquables dans de nombreux domaines tels que la vision par ordinateur ou le traitement de la langue et de la parole, en exploitant de grandes quantités de données non étiquetées. Dans le contexte spécifique de la parole, cependant, et malgré des résultats prometteurs, il existe un manque...
Preprint
Contrastive learning enables learning useful audio and speech representations without ground-truth labels by maximizing the similarity between latent representations of similar signal segments. In this framework various data augmentation techniques are usually exploited to help enforce desired invariances within the learned representations, improvi...
Preprint
The ubiquity of microphone-enabled devices has lead to large amounts of unlabelled audio data being produced at the edge. The integration of self-supervised learning (SSL) and federated learning (FL) into one coherent system can potentially offer data privacy guarantees while also advancing the quality and robustness of speech representations. In t...
Preprint
Full-text available
Recent works showed that end-to-end neural approaches tend to become very popular for spoken language understanding (SLU). Through the term end-to-end, one considers the use of a single model optimized to extract semantic information directly from the speech signal. A major issue for such models is the lack of paired audio and textual data with sem...
Article
Through solving pretext tasks, self-supervised learning leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task. In audio and speech signal processing, a wide range of features were engineered through decades of research efforts. As it turns out, learning to predict such features...
Conference Paper
Full-text available
In speech technologies, speaker's voice representation is used in many applications such as speech recognition, voice conversion , speech synthesis and, obviously, user authentication. Modern vocal representations of the speaker are based on neural embeddings. In addition to the targeted information, these representations usually contain sensitive...
Preprint
Through solving pretext tasks, self-supervised learning leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task. In various application domains, including computer vision, natural language processing and audio/speech signal processing, a wide range of features where engineered th...
Preprint
Full-text available
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the research and development of neural speech processing technologies by being simple, flexible, user-friendly, and well-documented. This paper describes the core architecture designed to support several tasks of common interest, allowing users to naturally co...
Preprint
Training Automatic Speech Recognition (ASR) models under federated learning (FL) settings has recently attracted considerable attention. However, the FL scenarios often presented in the literature are artificial and fail to capture the complexity of real FL systems. In this paper, we construct a challenging and realistic ASR federated experimental...
Preprint
Full-text available
Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce depe...
Preprint
Full-text available
Through solving pretext tasks, self-supervised learning (SSL) leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task. A common pretext task consists in pretraining a SSL model on pseudo-labels derived from the original signal. This technique is particularly relevant for speech d...
Preprint
Full-text available
Federated Learning (FL) allows edge devices to collaboratively learn a shared prediction model while keeping their training data on the device, thereby decoupling the ability to do machine learning from the need to store data in the cloud. Despite the algorithmic advancements in FL, the support for on-device training of FL algorithms on edge device...
Preprint
Full-text available
This paper introduces Timers and Such, a new open source dataset of spoken English commands for common voice control use cases involving numbers. We describe the gap in existing spoken language understanding datasets that Timers and Such fills, the design and creation of the dataset, and experiments with a number of ASR-based and end-to-end baselin...
Preprint
Full-text available
Despite impressive results, deep learning-based technologies also raise severe privacy and environmental concerns induced by the training procedure often conducted in datacenters. In response, alternatives to centralized training such as Federated Learning (FL) have emerged. Perhaps unexpectedly, FL, in particular, is starting to be deployed at a g...
Conference Paper
Full-text available
Despite impressive results, deep learning-based technologies also raise severe privacy and environmental concerns induced by the training procedure often conducted in data centers. In response, alternatives to centralized training such as Federated Learning (FL) have emerged. Perhaps unexpectedly, FL, in particular, is starting to be deployed at a...
Conference Paper
Full-text available
Despite the significant progress in automatic speech recognition (ASR), distant ASR remains challenging due to noise and reverberation. A common approach to mitigate this issue consists of equipping the recording devices with multiple microphones that capture the acoustic scene from different perspectives. These multi-channel audio recordings conta...
Preprint
Despite impressive results, deep learning-based technologies also raise severe privacy and environmental concerns induced by the training procedure often conducted in data centers. In response, alternatives to centralized training such as Federated Learning (FL) have emerged. Perhaps unexpectedly, FL in particular is starting to be deployed at a gl...
Preprint
Full-text available
With the increasing interest over speech technologies, numerous Automatic Speaker Verification (ASV) systems are employed to perform person identification. In the latter context, the systems rely on neural embeddings as a speaker representation. Nonetheless, such representations may contain privacy sensitive information about the speakers (e.g. age...
Preprint
Federated Learning (FL) has emerged as a promising technique for edge devices to collaboratively learn a shared prediction model, while keeping their training data on the device, thereby decoupling the ability to do machine learning from the need to store the data in the cloud. However, FL is difficult to implement and deploy in practice, consideri...
Preprint
Knowledge distillation has been widely used to compress existing deep learning models while preserving the performance on a wide range of applications. In the specific context of Automatic Speech Recognition (ASR), distillation from ensembles of acoustic models has recently shown promising results in increasing recognition performance. In this pape...
Preprint
Full-text available
Distant speech recognition remains a challenging application for modern deep learning based Automatic Speech Recognition (ASR) systems, due to complex recording conditions involving noise and reverberation. Multiple microphones are commonly combined with well-known speech processing techniques to enhance the original signals and thus enhance the sp...
Preprint
Full-text available
Despite the significant progress in automatic speech recognition (ASR), distant ASR remains challenging due to noise and reverberation. A common approach to mitigate this issue consists of equipping the recording devices with multiple microphones that capture the acoustic scene from different perspectives. These multi-channel audio recordings conta...
Article
Full-text available
Quaternion neural networks have recently received an increasing interest due to noticeable improvements over real-valued neural networks on real world tasks such as image, speech and signal processing. The extension of quaternion numbers to neural architectures reached state-of-the-art performances with a reduction of the number of neural parameter...
Conference Paper
Full-text available
Convolutional Neural Networks (CNN) have been used in Automatic Speech Recognition (ASR) to learn representations directly from the raw signal instead of hand-crafted acoustic features, providing a richer and lossless input signal. Recent researches propose to inject prior acoustic knowledge to the first convolutional layer by integrating the shape...
Thesis
In the recent years, deep learning has become the leading approach to modern artificial intelligence (AI). The important improvement in terms of processing time required for learning AI based models alongside with the growing amount of available data made of deep neural networks (DNN) the strongest solution to solve complex real-world problems. How...
Preprint
Full-text available
Modern end-to-end (E2E) Automatic Speech Recognition (ASR) systems rely on Deep Neural Networks (DNN) that are mostly trained on handcrafted and pre-computed acoustic features such as Mel-filter-banks or Mel-frequency cepstral coefficients. Nonetheless , and despite worse performances, E2E ASR models processing raw waveforms are an active research...
Article
Machine learning (ML) and deep learning with deep neural networks (DNN), have drastically improved the performances of modern systems on numerous spoken language understanding (SLU) related tasks. Since most of current researches focus on new neural architectures to enhance the performances in realistic conditions, few recent works investigated the...
Conference Paper
Full-text available
Deep learning is at the core of recent spoken language understanding (SLU) related tasks. More precisely, deep neu-ral networks (DNNs) drastically increased the performances of SLU systems, and numerous architectures have been proposed. In the real-life context of theme identification of telephone conversations , it is common to hold both a human,...
Conference Paper
Full-text available
Deep neural networks (DNNs) and more precisely recurrent neural networks (RNNs) are at the core of modern automatic speech recognition systems, due to their efficiency to process input sequences. Recently, it has been shown that different input representations, based on multidimensional algebras, such as complex and quaternion numbers, are able to...
Preprint
Full-text available
Conversely to the legislation that struggles to develop, regulate and supervise the use of artificial intelligence (AI), the civil society, that gradually realizes the fundamental issues and perspectives induced by this new technology, slowly starts to take responsibility and to mobilize. Social responsibility expresses itself through the emergence...
Preprint
Full-text available
Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due to their capability to learn short and long-term dependencies between the basic elements of a sequence. Nonetheless, popular tasks such as speech or images recognition, involve multi-dimensional input features that are characterized by strong internal dependen...
Preprint
Deep learning is at the core of recent spoken language understanding (SLU) related tasks. More precisely, deep neu-ral networks (DNNs) drastically increased the performances of SLU systems, and numerous architectures have been proposed. In the real-life context of theme identification of telephone conversations , it is common to hold both a human,...
Conference Paper
Full-text available
Neural network architectures are at the core of powerful automatic speech recognition systems (ASR). However, while recent researches focus on novel model architectures, the acoustic input features remain almost unchanged. Traditional ASR systems rely on multidimensional acoustic features such as the Mel filter bank energies alongside with the firs...
Preprint
Full-text available
Neural network architectures are at the core of powerful automatic speech recognition systems (ASR). However, while recent researches focus on novel model architectures, the acoustic input features remain almost unchanged. Traditional ASR systems rely on multidimensional acoustic features such as the Mel filter bank energies alongside with the firs...
Preprint
Full-text available
The availability of open-source software is playing a remarkable role in the popularization of speech recognition and deep learning. Kaldi, for instance, is nowadays an established framework used to develop state-of-the-art speech recognizers. PyTorch is used to build neural networks with the Python language and has recently spawn tremendous intere...
Preprint
Full-text available
The availability of open-source software is playing a remarkable role in the popularization of speech recognition and deep learning. Kaldi, for instance, is nowadays an established framework used to develop state-of-the-art speech recognizers. PyTorch is used to build neural networks with the Python language and has recently spawn tremendous intere...
Preprint
Full-text available
Convolutional neural networks (CNN) have recently achieved state-of-the-art results in various applications. In the case of image recognition, an ideal model has to learn independently of the training data, both local dependencies between the three components (R,G,B) of a pixel, and the global relations describing edges or shapes, making it efficie...
Preprint
Full-text available
Recurrent neural networks (RNN) are at the core of modern automatic speech recognition (ASR) systems. In particular, long-short term memory (LSTM) recurrent neu-ral networks have achieved state-of-the-art results in many speech recognition tasks, due to their efficient representation of long and short term dependencies in sequences of interdependen...
Preprint
Full-text available
Recently, the connectionist temporal classification (CTC) model coupled with recurrent (RNN) or convolutional neural networks (CNN), made it easier to train speech recognition systems in an end-to-end fashion. However in real-valued models, time frame components such as mel-filter-bank energies and the cepstral coefficients obtained from them, toge...
Preprint
Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due to their capability to learn short and long-term dependencies between the basic elements of a sequence. Nonetheless, popular tasks such as speech or images recognition, involve multi-dimensional input features that are characterized by strong internal dependen...
Conference Paper
Full-text available
Recently, the connectionist temporal classification (CTC) model coupled with recurrent (RNN) or convolutional neural networks (CNN), made it easier to train speech recognition systems in an end-to-end fashion. However in real-valued models , time frame components such as mel-filter-bank energies and the cepstral coefficients obtained from them, tog...
Conference Paper
Full-text available
Deep Neural Networks (DNN) received a great interest from researchers due to their capability to construct robust abstract representations of heterogeneous documents in a latent subspace. Nonetheless, mere real-valued deep neural networks require an appropriate adaptation, such as the con-volution process, to capture latent relations between input...
Conference Paper
Full-text available
In the last decades, encoder-decoders or autoen-coders (AE) have received a great interest from researchers due to their capability to construct robust representations of documents in a low dimensional sub-space. Nonetheless, autoencoders reveal little in way of spoken document internal structure by only considering words or topics contained in the...

Network

Cited By

Projects

Projects (4)
Project
SpeechBrain is an open-source toolkit designed to make research and development of speech technologies faster. It is flexible, modular, easy-to-use, well documented. https://speechbrain.github.io/
Archived project
SpeechBrain is an open-source and all-in-one speech toolkit relying on PyTorch. The goal is to create a single, flexible, and user-friendly toolkit that can be used to easily develop state-of-the-art speech technologies, including systems for speech recognition (both end-to-end and HMM-DNN), speaker recognition, speech separation, multi-microphone signal processing (e.g, beamforming), self-supervised and unsupervised learning, speech contamination / augmentation, and many others. The toolkit will be designed to be a stand-alone framework, but simple interfaces with well-known toolkits, such as Kaldi will also be implemented. SpeechBrain is currently under development and has been announced in September 2019. A first alpha version will be available in the next months. https://speechbrain.github.io/