Thomas BurgerFrench National Centre for Scientific Research | CNRS · ProFI (FR2048)
Thomas Burger
PhD
All the PDFs of my papers/preprints can be freely accessed through my personal webpage: download them without request!
About
102
Publications
21,502
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,865
Citations
Introduction
Publications
Publications (102)
Phosphorylation is a major post-translation modification (PTM) of proteins which is finely tuned by the activity of several hundred kinases and phosphatases. It controls most if not all cellular pathways including anti-viral responses. Accordingly, viruses often induce important changes in the phosphorylation of host factors that can either promote...
Background
Metabolic dysfunction-associated steatotic liver disease (MASLD) is estimated to affect 30% of the world’s population, and its prevalence is increasing in line with obesity. Liver fibrosis is closely related to mortality, making it the most important clinical parameter for MASLD. It is currently assessed by liver biopsy – an invasive pro...
Phosphorylation is a major post-translation modification (PTM) of proteins, and small molecules, which is finely tuned by the activity of several hundred kinases and phosphatases. It controls most if not all cellular pathways including anti-viral responses. Accordingly, viruses often induce important changes in the phosphorylation of host factors t...
Selecting omic biomarkers using both their effect size and their differential status significance (i.e., selecting the “volcano-plot outer spray”) has long been equally biologically relevant and statistically troublesome. However, recent proposals are paving the way to resolving this dilemma.
Label-free bottom-up proteomics using mass spectrometry and liquid chromatography has long established as one of the most popular high-throughput analysis workflow for proteome characterization. However, it produces data hindered by complex and heterogeneous missing values, which imputation has long remained problematic. To cope with this, we intro...
Cullin-RING finger ligases (CRLs) represent the largest family of ubiquitin ligases. They are responsible for the ubiquitination of ∼20% of cellular proteins degraded through the proteasome, by catalyzing the transfer of E2-loaded ubiquitin to a substrate. Seven Cullins are described in vertebrates. Among them, CUL4 associates with DDB1 to form the...
In discovery proteomics, as well as many other "omic" approaches, the possibility to test for the differential abundance of hundreds (or of thousands) of features simultaneously is appealing, despite requiring specific statistical safeguards, among which controlling for the false discovery rate (FDR) has become standard. Moreover, when more than tw...
In their recent article, Madej et al. (Madej, D.; Wu, L.; Lam, H.Common Decoy Distributions Simplify False Discovery Rate Estimation in Shotgun Proteomics. J. Proteome Res.2022, 21 (2), 339-348) proposed an original way to solve the recurrent issue of controlling for the false discovery rate (FDR) in peptide-spectrum-match (PSM) validation. Briefly...
In their recent article, Madej et al. 1 proposed an original way to solve the recurrent issue of controlling for the false discovery rate (FDR) in peptide-spectrum-match (PSM) validation. Briefly, they proposed to derive a single precise distribution of decoy matches termed the Common Decoy Distribution (CDD) and to use it to control for FDR during...
In discovery proteomics, as well as many other "omic" approaches, more than two biological conditions or group treatments can be compared in the one-way Analysis of Variance (OW-ANOVA) framework. The subsequent possibility to test for the differential abundance of hundreds (or thousands) of features simultaneously is appealing, despite requiring sp...
Background
Proteogenomics aims to identify variant or unknown proteins in bottom-up proteomics, by searching transcriptome- or genome-derived custom protein databases. However, empirical observations reveal that these large proteogenomic databases produce lower-sensitivity peptide identifications. Various strategies have been proposed to avoid this...
In their recent review ( J. Proteome Res. 2022, 21 (4), 849-864), Crook et al. diligently discuss the basics (and less basics) of Bayesian modeling, survey its various applications to proteomics, and highlight its potential for the improvement of computational proteomic tools. Despite its interest and comprehensiveness on these aspects, the pitfall...
Genes are pleiotropic and getting a better knowledge of their function requires a comprehensive characterization of their mutants. Here, we generated multi-level data combining phenomic, proteomic and metabolomic acquisitions from plasma and liver tissues of two C57BL/6 N mouse models lacking the Lat (linker for activation of T cells) and the Mx2 (...
Background
Proteogenomics aims to identify variant or unknown proteins in bottom-up proteomics, by searching transcriptome- or genome-derived custom protein databases. However, empirical observations reveal that these large proteogenomic databases produce lower-sensitivity peptide identifications. Various strategies have been proposed to avoid this...
In proteomic differential analysis, FDR control is often performed through a multiple test correction (i.e., the adjustment of the original p-values). In this protocol, we apply a recent and alternative method, based on so-called knockoff filters. It shares interesting conceptual similarities with the target-decoy competition procedure, classically...
Factorization of large data corpora has emerged as an essential technique to extract dictionaries (sets of patterns that are meaningful for sparse encoding). Following this line, we present a novel algorithm based on compressive learning theory. In this framework, the (arbitrarily large) dataset of interest is replaced by a fixed‐size sketch result...
In proteomic differential analysis, FDR control is often performed through a multiple test correction (i.e., the adjustment of the original p-values). In this protocol, we apply a recent and alternative method, based on so-called knockoff filters. It shares interesting conceptual similarities with the target–decoy competition procedure, classically...
Prostar is a software tool dedicated to the processing of quantitative data resulting from mass spectrometry-based label-free proteomics. Practically, once biological samples have been analyzed by bottom-up proteomics, the raw mass spectrometer outputs are processed by bioinformatics tools, so as to identify peptides and quantify them, notably by m...
Background
The clustering of data produced by liquid chromatography coupled to mass spectrometry analyses (LC-MS data) has recently gained interest to extract meaningful chemical or biological patterns. However, recent instrumental pipelines deliver data which size, dimensionality and expected number of clusters are too large to be processed by cla...
Summary: Many factors can influence results in clinical research, in particular bias in the distribution of samples prior to biochemical preparation. Well Plate Maker is a user-friendly application to design single- or multiple-well plate assays. It allows multiple group experiments to be randomized and therefore helps to reduce possible batch effe...
In bottom-up discovery proteomics, target-decoy competition (TDC) is the most popular method for false discovery rate (FDR) control. Despite unquestionable statistical foundations, this method has drawbacks, including its hitherto unknown intrinsic lack of stability vis-à-vis practical conditions of application. Although some consequences of this i...
Motivation
Quantitative mass spectrometry-based proteomics data are characterized by high rates of missing values, which may be of two kinds: missing completely-at-random (MCAR) and missing not-at-random (MNAR). Despite numerous imputation methods available in the literature, none account for this duality, for it would require to diagnose the missi...
Wilson’s disease (WD), a rare genetic disease caused by mutations in the ATP7B gene, is associated with altered expression and/or function of the copper-transporting ATP7B protein, leading to massive toxic accumulation of copper in the liver and brain. The Atp7b-/- mouse, a genetic and phenotypic model of WD, was developed to provide new insights i...
Target-decoy competition (TDC) is the most popular method for false discovery rate (FDR) control in bottom-up discovery proteomics. Despite unquestionable statistical foundations, we unveil a so far unknown weakness of TDC: its intrinsic lack of stability vis-à-vis practical conditions of application. Although some consequences of this instability...
Results from mass spectrometry based quantitative proteomics analysis correspond to a subset of proteins which are considered differentially abundant relative to a control. Their selection is delicate and often requires some statistical expertise in addition to a refined knowledge of the experimental data. To facilitate the selection process, we ha...
ProStaR is a software tool dedicated to differential analysis in label-free quantitative proteomics. Practically, once biological samples have been analyzed by bottom-up mass spectrometry-based proteomics, the raw mass spectrometer outputs are processed by bioinformatics tools, so as to identify peptides and quantify them, by means of precursor ion...
The term “spectral clustering” is sometimes used to refer to the clustering of mass spectrometry data. However, it also classically refers to a family of popular clustering algorithms. To avoid confusion, a more specific term could advantageously be coined.
We propose a new hypothesis test for the differential abundance of proteins in mass-spectrometry based relative quantification. An important feature of this type of high-throughput analyses is that it involves an enzymatic digestion of the sample proteins into peptides prior to identification and quantification. Due to numerous homology sequences,...
The vocabulary of theoretical statistics can be difficult to embrace from the viewpoint of computational proteomics research, even though the notions it conveys are essential to publication guidelines. For example, “adjusted p-values”, “q-values” and “false discovery rates” are essentially similar concepts, whereas “false discovery rate” and “false...
We propose a new hypothesis test for the differential abundance of proteins in mass-spectrometry based relative quantification. An important feature of this type of high-throughput analyses is that it involves an enzymatic digestion of the sample proteins into peptides prior to identification and quantification. Due to numerous homology sequences,...
DAPAR and ProStaR are software tools to perform the statistical analysis of label-free XIC-based quantitative discovery proteomics experiments. DAPAR contains procedures to filter, normalize, impute missing value, aggregate peptide intensities, perform null hypothesis significance tests and select the most likely differentially abundant proteins wi...
Selecting proteins with significant differential abundance is the cornerstone of many relative quantitative proteomics experiments. To do so, a trade-off between p-value thresholding and fold-change thresholding can be performed thanks to a specific parameter, named fudge factor, and classically noted s(0) . We have observed that this fudge factor...
Missing values are a genuine issue in label-free quantitative proteomics. Recent works have surveyed the different statistical methods to conduct imputation and have compared them on real or simulated datasets, and recommended a list of missing value imputation methods for proteomics application. Although insightful, these comparisons do not accoun...
Recently, several works have focused on the study of conflict among belief functions with a geometric approach, trying to elaborate on the intuition that distant belief functions are more conflicting than neighboring ones. In this article, I discuss the extent to which the mathematical properties of a metric are compliant with what can be expected...
In mass-spectrometry based quantitative proteomics, the false discovery rate control (i.e. the limitation of the number of proteins which are wrongly claimed as differentially abundant between several conditions) is a major post-analysis step. It is classically achieved thanks to a specific statistical procedure which computes the adjusted p-values...
Machine learning is a quickly evolving field which now looks really different
from what it was 15 years ago, when classification and clustering were major
issues. This document proposes several trends to explore the new questions of
modern machine learning, with the strong afterthought that the belief function
framework has a major role to play.
Combining pieces of information provided by several sources without or with little prior knowledge about the behavior of the sources is an old yet still important and rather open problem in the belief function theory. In this paper, we propose an approach to select the behavior of sources based on a very general and expressive fusion scheme, that h...
Dempster-Shafer Theory (DST) is particularly efficient in combining multiple information sources providing incomplete, imprecise, biased, and conflictive knowledge. In this work, we focused on the improvement of the accuracy rate and the reliability of a HMM based handwriting recognition system, by the use of Dempster-Shafer Theory (DST). The syste...
This paper presents a new non-negative matrix factorization technique which (1) allows the decomposition of the original data on multiple latent factors accounting for the geometrical structure of the manifold embedding the data; (2) provides an optimal representation with a controllable level of sparsity; (3) has an overall linear complexity allow...
Recently, several works have focused on the study of conflict among belief functions with a geometrical approach. In such framework, a corner stone is to endow the set of belief functions with an appropriated metric, and to consider that distant belief functions are more conflicting than neighboring ones. This article discusses such approaches, cav...
Photosynthesis has shaped atmospheric and ocean chemistries and probably changed the climate as well, as oxygen is released from water as part of the photosynthetic process. In photosynthetic eukaryotes, this process occurs in the chloroplast, an organelle containing the most abundant biological membrane, the thylakoids. The thylakoids of plants an...
Quantitative mass spectrometry based spatial proteomics involves elaborate, expensive and time consuming experimental procedures and considerable effort is invested in the generation of such data. Multiple research groups have described a variety of approaches to establish high quality proteome-wide data sets. However, data analysis is as critical...
Hyperspectral data analysis has been given a growing attention due to the scientific challenges it raises and the wide set of applications that can benefit from it. Classification of hyperspectral images has been identified as one of the hottest topics in this context, and has been mainly addressed by discriminative methods such as SVM. In this pap...
Experimental spatial proteomics, i.e the high-throughput assignment of proteins to sub-cellular compartments based on quantitative proteomics data, promises to shed new light on many biological processes given adequate computational tools.
Here we present pRoloc, a complete infrastructure to support and guide the sound analysis of quantitative mass...
As Dempster-Shafer theory spreads in different application fields, and as mass functions are involved in more and more complex systems, the need for algorithms randomly generating mass functions arises. Such algorithms can be used, for instance, to evaluate some statistical properties or to simulate the uncertainty in some systems (e.g., data base...
Automation of smart home for ambient assisted living is currently based on a widespread use of sensors. As efficient as it seems to be, this solution can sometimes be problematic when one focus on user acceptability intimately related to cost and intrusivity. In this paper, we propose a context-aware system based on the semantic analysis of each us...
Combining pieces of information provided by several sources without prior knowledge about the behavior of the sources is an old yet still important and rather open problem in belief function theory. In this paper, we propose a general approach to select the behavior of sources, based on two cornerstones of information fusion that are the notions of...
In the Hilbert space reproducing the Gaussian kernel, projected data points are located on an hypersphere. Following some recent works on geodesic analysis on that particular manifold, we propose a method which purpose is to select a subset of input data by sampling the corresponding hypersphere. The selected data should represent correctly the inp...
Population ageing is widespread across the world. Unprecedented in the history of mankind, this demographic trend leads to a number of social and economic issues related to ageing/disabled people
whose number increases considerably over the years. As the number of caregivers can not evolve accordingly, we must now think of alternatives allowing the...
Using kernels to embed non linear data into high dimensional spaces where linear analysis is possible has become utterly classical. In the case of the Gaussian kernel however, data are distributed on a hypersphere in the corresponding Reproducing Kernel Hilbert Space (RKHS). Inspired by previous works in non-linear statistics, this article investig...
Recently, the problem of measuring the conflict between two bodies of evidence represented by belief functions has known a regain of interest. In most works related to this issue, Dempster's rule plays a central role. In this paper, we propose to study the notion of conflict from a different perspective. We start by examining consistency and confli...
A new classification technique, PerTurbo, has been investigated in the context on hyperspectral remote sensing images context. In this framework, each class is characterised by its Laplace-Beltrami operator, then approximated by the spectrum of K(S), whose terms are derived from the Gaussian kernel. The method is very simple, easy to implement and...
Automation of smart home for ambient assisted living is cur-rently based on a widespread use of sensors. In this paper, we propose a monitoring system based on the semantic analysis of home automation logs (user requests). Our goal is to replace as many sensors as possible by using advanced tools to infer information usually sensored. To take up th...
L'automatisation et la supervision des systèmes pervasifs est à l'heure actuelle principalement basée sur l'utilisation massive de capteurs distribués dans l'environnement. Dans cet article, nous proposons un modèle de super-vision d'interactions basé sur l'analyse sémantique des logs domotiques (commandes émises par l'utilisateur), visant à limite...
In the proteomics field, the production and publication of reliable mass spectrometry (MS)-based label-free quantitative results is a major concern. Due to the intrinsic complexity of bottom-up proteomics experiments (requiring aggregation of data relating to both precursor and fragment peptide ions into protein information, and matching this data...
As Dempster-Shafer theory spreads in different applications fields involving complex systems, the need for algorithms randomly generating mass functions arises. As such random generation is often perceived as secondary, most proposed algorithms use procedures whose sample statistical properties are difficult to characterize. Thus, although they pro...
The problem of conflict measurement between information sources knows a regain of interest. In most works related to this issue, Dempter's rule plays a central role. In this paper, we propose to revisit conflict from a different perspective. We do not make a priori assumption about dependencies and start from the definition of conflicting sets, stu...
Population ageing is set to affect European countries over the coming decades, increasing the number of dependent people. In this context, Ambient Assisted Living (AAL) is one solution to enable these people to stay in their preferred environment longer, thus delaying hospitalization. To take up these challenges, this paper proposes an original lin...
To study chloroplast metabolism and functions, subplastidial localization is a prerequisite to achieve protein functional characterization. As the accurate localization of many chloroplast proteins often remains hypothetical, we set up a proteomics strategy in order to assign the accurate subplastidial localization. A comprehensive study of Arabido...
PerTurbo, an original, non-parametric and efficient classification method is presented here. In our framework, the manifold
of each class is characterized by its Laplace-Beltrami operator, which is evaluated with classical methods involving the graph
Laplacian. The classification criterion is established thanks to a measure of the magnitude of the...
In this paper, a novel rejection strategy is proposed to optimize the reliability of an handwritten word recognition system. The proposed approach is based on several steps. First, we combine the outputs of several HMM classifiers using the Dempster-Shafer theory (DST). Then, we take advantage of the expressivity of mass functions (the counter part...
Thesearchfornewsimplifiedinteractiontechniquesismainly motivated by the improvements of the communication with interactive devices. In this paper, we present an interactive TVs module capable of recognizing human gestures through the PS3Eye low-cost camera. We recognize gestures by the tracking of human skin blobs and analyzing the corresponding mo...
The search for new simplied interaction techniques is mainly motivated by the improvements of the communication with interactive devices. In this paper, we present an interactive TVs module capable of recognizing human gestures through the PS3Eye low-cost camera. We recognize gestures by the tracking of human skin blobs and analyzing the correspond...
The Dempster-Shafer theory (DST) is particularly interesting to deal with imprecise information. However, it is known for
its high computational cost, as dealing with a frame of discernment Ω involves the manipulation of up to 2|Ω| elements. Hence, classification problems where the number of classes is too large cannot be considered. In this paper,...
People with disabilities sometimes have considerable difficulties, or even physical incapacities, performing daily tasks independently. Many research works have introduced home automation as a useful way to overcome these activity limitations. However, very few of these accomplishments have focused on the design of intelligent systems which would a...
The classification process in handwriting recognition is designed to provide lists of results rather than single results, so that context models can be used as post-processing. Most of the time, the length of the list is determined once and for all the items to classify. Here, we present a method based on Dempster-Shafer theory that allows a differ...
Les personnes à mobilité réduite éprouvent des difficultés importantes, sinon une incapacité phy-sique totale à effectuer les tâches quotidiennes de manière autonome. Une des solutions permet-tant de compenser ces incapacités motrices consiste à s'appuyer sur des technologies d'automa-tisation du fonctionnement de l'habitat. De nombreuses initiativ...
In this work, we focus on an improvement of a multi-script handwritting recognition system using a HMM based classiers combi- nation. The improvement relies on the use of Dempster-Shafer theory to combine in a ner way the probabilistic outputs of the HMM classiers. The experiments are conducted on two public databases written on two dierent scripts...
In this paper, we consider the dominance properties of the set of the pignistic k-additive belief functions. Then, given k, we conjecture the shape of the polytope of all the k-additive belief functions dominating a given belief function, starting from an analogy with the case of dominating probability measures. Under such conjecture, we compute th...
Approximating a belief function (with a probability distribution or with another belief function with a restricted number of focal elements) is an important issue in Dempster- Shafer Theory. The reason is that such approximations are really useful in two different situations: (1) decision making and (2) computational saving. In this paper, we propo...
Considering handwriting recognition, we compare the accuracy of probabilistic and eviential methods for ensemble HMM classifier combination. The recognition performances show that, in case of simple database, the probabilistic methods are more efficient. On the other hand, for more difficult recognition tasks (large vocabulary, weak classifiers, et...
Les personnes en situation de handicap éprouvent parfois de réelles difficultés, voire une incapacité physique totale, à effectuer les activités de la vie quotidienne de manière au-tonome. De nombreuses initiatives ont introduit la domotique et les habitats intelligents comme une solution possible pour compenser ce handicap. Cependant, très peu de...
The Transferable Belief Model is a powerful interpretation of belief function theory where decision making is based on the pignistic transform. Smets has proposed a generalization of the pignistic trans- form which appears to be equivalent to the Shapley value in the trans- ferable utility model. It corresponds to the situation where the decision m...
Most of the research on sign language recognition concentrates on recognizing only manual signs (hand gestures and shapes), discarding a very important component: the non-manual signals (facial expressions and head/shoulder motion). We address the recognition of signs with both manual and non-manual components using a sequential belief-based fusion...
As part of our work on hand gesture interpretation, we present our results on hand shape recognition. Our method is based
on attribute extraction and multiple partial classifications. The novelty lies in the fashion the fusion of all the partial
classification results are performed. This fusion is (1) more efficient in terms of information theory a...
Gestural interfaces, besides providing natural means of human computer interaction for everyone, enable the hearing impaired to use sign language or better understand speech through vision. This chapter overviews (1) the various modalities involved in gestured languages (2) the mean to automatically apprehend them individually and (3) to fuse them...