About
224
Publications
44,188
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,887
Citations
Citations since 2017
Introduction
Additional affiliations
December 2015 - December 2015
Université d´Avignon
Position
- Vice-President for Research, Avignon University
April 2010 - December 2015
Université d´Avignon
Position
- Head of the LIA (Computer Science Lab, Avignon University)
January 2008 - December 2010
Publications
Publications (224)
Abusive behaviors are common on online social networks. The increasing frequency of anti-social behaviors forces the hosts of online platforms to find new solutions to address this problem. Automating the moderation process has thus received a lot of interest in the past few years. Various methods have been proposed, most based on the exchanged con...
Abusive behaviors are common on online social networks. The increasing frequency of antisocial behaviors forces the hosts of online platforms to find new solutions to address this problem. Automating the moderation process has thus received a lot of interest in the past few years. Various methods have been proposed, most based on the exchanged cont...
For over a decade, TV series have been drawing increasing interest, both from the audience and from various academic fields. But while most viewers are hooked on the continuous plots of TV serials, the few annotated datasets available to researchers focus on standalone episodes of classical TV series. We aim at filling this gap by providing the mul...
Quaternion neural networks have recently received an increasing interest due to noticeable improvements over real-valued neural networks on real world tasks such as image, speech and signal processing. The extension of quaternion numbers to neural architectures reached state-of-the-art performances with a reduction of the number of neural parameter...
With the spread of online social networks, it is more and more difficult to monitor all the user-generated content. Automating the moderation process of the inappropriate exchange content on Internet has thus become a priority task. Methods have been proposed for this purpose, but it can be challenging to find a suitable dataset to train and develo...
For over a decade, TV series have been drawing increasing interest, both from the audience and from various academic fields. But while most viewers are hooked on the continuous plots of TV serials, the few annotated datasets available to researchers focus on standalone episodes of classical TV series. We aim at filling this gap by providing the mul...
Abusive behaviors are common on online social networks. The increasing frequency of antisocial behaviors forces the hosts of online platforms to find new solutions to address this problem. Automating the moderation process has thus received a lot of interest in the past few years. Various methods have been proposed, most based on the exchanged cont...
Modern end-to-end (E2E) Automatic Speech Recognition (ASR) systems rely on Deep Neural Networks (DNN) that are mostly trained on handcrafted and pre-computed acoustic features such as Mel-filter-banks or Mel-frequency cepstral coefficients. Nonetheless , and despite worse performances, E2E ASR models processing raw waveforms are an active research...
Machine learning (ML) and deep learning with deep neural networks (DNN), have drastically improved the performances of modern systems on numerous spoken language understanding (SLU) related tasks. Since most of current researches focus on new neural architectures to enhance the performances in realistic conditions, few recent works investigated the...
Deep learning is at the core of recent spoken language understanding (SLU) related tasks. More precisely, deep neu-ral networks (DNNs) drastically increased the performances of SLU systems, and numerous architectures have been proposed. In the real-life context of theme identification of telephone conversations , it is common to hold both a human,...
Deep neural networks (DNNs) and more precisely recurrent neural networks (RNNs) are at the core of modern automatic speech recognition systems, due to their efficiency to process input sequences. Recently, it has been shown that different input representations, based on multidimensional algebras, such as complex and quaternion numbers, are able to...
Today's popular TV series tend to develop continuous, complex plots spanning several seasons, but are often viewed in controlled and discontinuous conditions. Consequently, most viewers need to be re-immersed in the story before watching a new season. Although discussions with friends and family can help, we observe that most viewers make extensive...
Today’s popular tv series tend to develop continuous, complex plots spanning several seasons, but are often viewed in controlled and discontinuous conditions. Consequently, most viewers need to be re-immersed in the story before watching a new season. Although discussions with friends and family can help, we observe that most viewers make extensive...
Deep neural networks (DNNs) and more precisely recurrent neural networks (RNNs) are at the core of modern automatic speech recognition systems, due to their efficiency to process input sequences. Recently, it has been shown that different input representations, based on multidimensional algebras, such as complex and quaternion numbers, are able to...
In recent years, online social networks have allowed world-wide users to meet and discuss. As guarantors of these communities, the administrators of these platforms must prevent users from adopting inappropriate behaviors. This verification task, mainly done by humans, is more and more difficult due to the ever growing amount of messages to check....
In recent years, online social networks have allowed worldwide users to meet and discuss. As guarantors of these communities, the administrators of these platforms must prevent users from adopting inappropriate behaviors. This verification task, mainly done by humans, is more and more difficult due to the ever growing amount of messages to check. M...
Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due to their capability to learn short and long-term dependencies between the basic elements of a sequence. Nonetheless, popular tasks such as speech or images recognition, involve multi-dimensional input features that are characterized by strong internal dependen...
Deep learning is at the core of recent spoken language understanding (SLU) related tasks. More precisely, deep neu-ral networks (DNNs) drastically increased the performances of SLU systems, and numerous architectures have been proposed. In the real-life context of theme identification of telephone conversations , it is common to hold both a human,...
Moderation of user-generated content in an online community is a challenge that has great socio-economic ramifications. However, the costs incurred by delegating this paper to human agents are high. For this reason, an automatic system able to detect abuse in user-generated content is of great interest. There are a number of ways to tackle this pro...
Speaker diarization of audio streams turns out to be particularly challenging when applied to fictional films, where many characters talk in various acoustic conditions (background music, sound effects, variations in intonation...). Despite this acoustic variability, such movies exhibit specific visual patterns, particularly within dialogue scenes....
Neural network architectures are at the core of powerful automatic speech recognition systems (ASR). However, while recent researches focus on novel model architectures, the acoustic input features remain almost unchanged. Traditional ASR systems rely on multidimensional acoustic features such as the Mel filter bank energies alongside with the firs...
Neural network architectures are at the core of powerful automatic speech recognition systems (ASR). However, while recent researches focus on novel model architectures, the acoustic input features remain almost unchanged. Traditional ASR systems rely on multidimensional acoustic features such as the Mel filter bank energies alongside with the firs...
Convolutional neural networks (CNN) have recently achieved state-of-the-art results in various applications. In the case of image recognition, an ideal model has to learn independently of the training data, both local dependencies between the three components (R,G,B) of a pixel, and the global relations describing edges or shapes, making it efficie...
Recurrent neural networks (RNN) are at the core of modern automatic speech recognition (ASR) systems. In particular, long-short term memory (LSTM) recurrent neu-ral networks have achieved state-of-the-art results in many speech recognition tasks, due to their efficient representation of long and short term dependencies in sequences of interdependen...
Recently, the connectionist temporal classification (CTC) model coupled with recurrent (RNN) or convolutional neural networks (CNN), made it easier to train speech recognition systems in an end-to-end fashion. However in real-valued models, time frame components such as mel-filter-bank energies and the cepstral coefficients obtained from them, toge...
Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due to their capability to learn short and long-term dependencies between the basic elements of a sequence. Nonetheless, popular tasks such as speech or images recognition, involve multi-dimensional input features that are characterized by strong internal dependen...
Recently, the connectionist temporal classification (CTC) model coupled with recurrent (RNN) or convolutional neural networks (CNN), made it easier to train speech recognition systems in an end-to-end fashion. However in real-valued models , time frame components such as mel-filter-bank energies and the cepstral coefficients obtained from them, tog...
Identifying and characterizing the dynamics of modern tv series subplots is an open problem. One way is to study the underlying social network of interactions between the characters. Standard dynamic network extraction methods rely on temporal integration, either over the whole considered period, or as a sequence of several time-slices. However, th...
Identifying and characterizing the dynamics of modern TV series subplots is an open problem. One way is to study the underlying social network of interactions between the characters. Standard dynamic network extraction methods rely on temporal integration, either over the whole considered period, or as a sequence of several time-slices. However, th...
Deep Neural Networks (DNN) received a great interest from researchers due to their capability to construct robust abstract representations of heterogeneous documents in a latent subspace. Nonetheless, mere real-valued deep neural networks require an appropriate adaptation, such as the con-volution process, to capture latent relations between input...
While online communities have become increasingly important over the years, the moderation of user-generated content is still performed mostly manually. Automating this task is an important step in reducing the financial cost associated with moderation, but the majority of automated approaches strictly based on message content are highly vulnerable...
In the last decades, encoder-decoders or autoen-coders (AE) have received a great interest from researchers due to their capability to construct robust representations of documents in a low dimensional sub-space. Nonetheless, autoencoders reveal little in way of spoken document internal structure by only considering words or topics contained in the...
Automatic transcription of spoken documents is affected by automatic transcription errors that are especially frequent when speech is acquired in severe noisy conditions. Automatic Speech Recognition (ASR) errors induce errors in the linguistic features used for a variety of Natural Language Processing (NLP) tasks. Recently, denoisng autoencoders (...
Online communities have gained considerable importance in recent years due to the increasing number of people connected to the Internet. Moderating user content in online communities is mainly performed manually, and reducing the workload through automatic methods is of great financial interest for community maintainers. Often, the industry uses ba...
The number of documents available into Internet moves each day up. For this reason, processing this amount of information effectively and expressibly becomes a major concern for companies and scientists. Methods that represent a textual document by a topic representation are widely used in Information Retrieval (IR) to process big data such as Wiki...
The diachronic nature of broadcast news data leads to the problem of Out-Of-Vocabulary (OOV) words in Large Vocabulary Continuous Speech Recognition (LVCSR) systems. Analysis of OOV words reveals that a majority of them are Proper Names (PNs). However PNs are important for automatic indexing of audio-video content and for obtaining reliable automat...
Most of the Natural and Spoken Language Processing tasks now employ Neural Networks (NN), allowing them to reach impressive performances. Embedding features allow the NLP systems to represent input vectors in a latent space and to improve the observed performances. In this context, Recurrent Neural Network (RNN) based architectures such as Long Sho...
Recently, machine learning methods have provided a broad spectrum of original and efficient algorithms based on Deep Neural Networks (DNN) to automatically predict an outcome with respect to a sequence of inputs. Recurrent hidden cells allow these DNN-based models to manage long-term dependencies such as Recurrent Neural Networks (RNN) and Long Sho...
This paper proposes a new prediction process to explain and predicts popularity evolution of YouTube videos. We exploit prior study on the classification of YouTube videos in order to predict the evolution of videos' view-count. This classification allows to identify important factors of the observed popularity dynamics. In particular, we use this...
Modern popular tv series often develop complex storylines spanning several seasons, but are usually watched in quite a discontinuous way. As a result, the viewer generally needs a comprehensive summary of the previous season plot before the new one starts. The automation of such a task requires to identify and characterize the dynamics of the serie...
Proper name recognition is a challenging task in information retrieval from large audio/video databases. Proper names are semantically rich and are usually key to understanding the information contained in a document. Our work focuses on increasing the vocabulary coverage of a speech transcription system by automatically retrieving proper names fro...
A review is proposed of the impact of word representations and classification methods in the task of theme identification of telephone conversation services having highly imperfect automatic transcriptions. We firstly compare two word-based representations using the classical Term Frequency-Inverse Document Frequency with Gini purity criteria (TF-I...
Social Networks became a major actor in information propagation. Using the Twitter popular platform, mobile users post or relay messages from different locations. The tweet content, meaning and location show how an event-such as the bursty one “JeSuisCharlie”’ happened in France in January 2015 is comprehended in different countries. This research...
Many Proper Names (PNs) are Out-Of-Vocabulary (OOV) words for speech
recognition systems used to process diachronic audio data. To help recovery of
the PNs missed by the system, relevant OOV PNs can be retrieved out of the many
OOVs by exploiting semantic context of the spoken content. In this paper, we
propose two neural network models targeted to...
Social Networks became a major actor in information propagation. Using the Twitter popular platform, mobile users post or relay
messages from different locations. The tweet content, meaning and location, show how an event-such as the bursty one ”JeSuisCharlie”,
happened in France in January 2015, is comprehended in different countries. This researc...
Mapping text documents in an LDA-based topic-space is a classical way to extract high-level representation of text documents. Unfortunately, LDA is highly sensitive to hyper-parameters related to the number of classes, or word and topic distribution, and there is no systematic way to pre-estimate optimal configurations. Moreover, various hyper-para...
This paper presents investigations about the automatic identification of video genre by audio channel analysis. Genre refers to editorial styles such commercials, movies, sports… We propose and evaluate some methods based on both low and high level descriptors, in cepstral or time domains, but also by analyzing the global structure of the document...
Multimedia traffic over the Internet has boosted in the recent years and more
media is watched and shared online. While being inpactful from many
pespectives, predicting the interest generated by an online content is
heterogeneous and constitutes a challenging task. Based on popularity patterns
classification we suggest methods to improve any predi...
Speaker diarization may be difficult to achieve when applied to narrative films, where speakers usually talk in adverse acoustic conditions: background music, sound effects, wide variations in intonation may hide the inter-speaker variability and make audio-based speaker diarization approaches error prone. On the other hand, such fictional movies e...
Speech analytics suffer from poor automatic transcription quality. To tackle this difficulty, a solution consists in mapping transcriptions into a space of hidden topics. This abstract representation allows to work around drawbacks of the ASR process. The well-known and commonly used one is the topic-based representation from a Latent Dirichlet All...
Performance of Automatic Speech Recognition (ASR) systems drops dramatically when transcribing conversations recorded in noisy conditions. Speech analytics suffer from this poor automatic transcription quality. To tackle this difficulty, a solution consists in mapping transcriptions into a space of hidden topics. This abstract representation allows...
Cet article présente un système d'identification de personnes dans des flux multimédia. Ce système a été engagé dans le défi REPERE, co-organisé par l'ANR et la DGA et qui s'est terminé en 2014. La tâche principale du défi consistait à identifier des individus apparaissant dans au moins une des modalités portées par la vidéo, qu'il s'agisse de locu...
La segmentation de flux audio en locuteurs apparaît particulièrement délicate lors-qu'elle est appliquée à des films de fiction, où de nombreux personnages parlent dans des conditions acoustiques variables (musique de fond, bruitages, fluctuations dans l'intonation...). Au-delà d'une telle variabilité acoustique, ce type de films exhibe cependant d...
Proper names are usually keys to understand the information contained in a document. Our work focuses on increasing the vocabulary size of a speech transcription system by automatically retrieving proper names from contemporary diachronic text corpus. We assume that some proper names appear in documents relating to the same time period and in simil...