David Rossell

David Rossell
University Pompeu Fabra | UPF · Department of Economy and Business

Ph.D.

About

72
Publications
9,781
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,357
Citations
Citations since 2017
31 Research Items
3779 Citations
20172018201920202021202220230100200300400500600
20172018201920202021202220230100200300400500600
20172018201920202021202220230100200300400500600
20172018201920202021202220230100200300400500600
Introduction
David Rossell currently works at the Department of Economy and Business , University Pompeu Fabra. David does research in Statistics and Bioinformatics. Their most recent publication is 'A framework for posterior consistency in model selection'.
Additional affiliations
January 2008 - present
IRB Barcelona Institute for Research in Biomedicine
Description
  • Unit Manager
January 2007 - December 2007
University of Texas MD Anderson Cancer Center
Description
  • Post-doctoral fellow
Education
August 2003 - December 2006
Rice University
Field of study
  • Biostatistics

Publications

Publications (72)
Article
Standard likelihood penalties to learn Gaussian graphical models are based on regularising the off‐diagonal entries of the precision matrix. Such methods, and their Bayesian counterparts, are not invariant to scalar multiplication of the variables, unless one standardises the observed data to unit sample variances. We show that such standardisation...
Preprint
Full-text available
I briefly discuss the Martingale Posteriors Distributions paper by Edwing Hong, Chris Holmes and Stephen G. Walker
Article
Full-text available
We discuss the role of misspecification and censoring on Bayesian model selection in the contexts of right-censored survival and concave log-likelihood regression. Misspecification includes wrongly assuming the censoring mechanism to be noninformative. Emphasis is placed on additive accelerated failure time, Cox proportional hazards and probit mode...
Article
Full-text available
Statisticians often face the choice between using probability models or a paradigm defined by minimising a loss function. Both approaches are useful and, if the loss can be re‐cast into a proper probability model, there are many tools to decide which model or loss is more appropriate for the observed data, in the sense of explaining the data's natu...
Preprint
Full-text available
A frequent challenge when using graphical models in applications is that the sample size is limited relative to the number of parameters to be learned. Our motivation stems from applications where one has external data, in the form of networks between variables, that provides valuable information to help improve inference. Specifically, we depict t...
Article
Full-text available
A key issue in science is assessing robustness to data analysis choices, while avoiding selective reporting and providing valid inference. Specification Curve Analysis is a tool intended to prevent selective reporting. Alas, when used for inference it can create severe biases and false positives, due to wrongly adjusting for covariates, and mask im...
Article
Full-text available
Bio-oils are precursors for biofuels but are highly corrosive necessitating further upgrading. Furthermore, bio-oil samples are highly complex and represent a broad range of chemistries. They are complex mixtures not simply because of the large number of poly-oxygenated compounds but because each composition can comprise many isomers with multiple...
Preprint
Full-text available
A key issue in science is assessing robustness to data analysis choices, while avoiding selective reporting and providing valid inference. Specification Curve Analysis is a tool intended to prevent selective reporting. Alas, when used for inference it can create severe biases and false positives, due to wrongly adjusting for covariates, and mask im...
Preprint
Full-text available
We address applied and computational issues for the problem of multiple treatment effect inference under many potential confounders. While there is abundant literature on the harmful effects of omitting relevant controls (under-selection), we show that over-selection can be comparably problematic, introducing substantial variance and a bias related...
Article
Full-text available
We propose the approximate Laplace approximation (ALA) to evaluate integrated likelihoods, a bottleneck in Bayesian model selection. The Laplace approximation (LA) is a popular tool that speeds up such computation and equips strong model selection properties. However, when the sample size is large or one considers many models the cost of the requir...
Preprint
Full-text available
Statisticians often face the choice between using probability models or a paradigm defined by minimising a loss function. Both approaches are useful and, if the loss can be re-cast into a proper probability model, there are many tools to decide which model or loss is more appropriate for the observed data, in the sense of explaining the data's natu...
Preprint
Full-text available
Standard likelihood penalties to learn Gaussian graphical models are based on regularising the off-diagonal entries of the precision matrix. Such methods, and their Bayesian counterparts, are not invariant to scalar multiplication of the variables, unless one standardises the observed data to unit sample variances. We show that such standardisation...
Preprint
Full-text available
We propose the approximate Laplace approximation (ALA) to evaluate integrated likelihoods, a bottleneck in Bayesian model selection. The Laplace approximation (LA) is a popular tool that speeds up such computation and equips strong model selection properties. However, when the sample size is large or one considers many models the cost of the requir...
Preprint
Science suffers from a reproducibility crisis. Specification Curve Analysis (SCA) helps address this crisis by preventing the selective reporting of results and arbitrary data analysis choices. SCA plots the variability (or heterogeneity) of treatment effects against all ‘reasonable specifications’ (ways to conduct analysis). However, SCA has also...
Preprint
The Gaussian model is equipped with strong properties that facilitate studying and interpreting graphical models. Specifically it reduces conditional independence and the study of positive association to determining partial correlations and their signs. When Gaussianity does not hold partial correlation graphs are a useful relaxation of graphical m...
Chapter
MHC class I proteins present intracellular peptides on the cell’s surface, enabling the immune system to recognize tumor-specific neoantigens of early neoplastic cells and eliminate them before the tumor develops further. However, variability in peptide-MHC-I affinity results in variable presentation of oncogenic peptides, leading to variable likel...
Article
Full-text available
The use of hyphenated Fourier transform mass spectrometry (FTMS) methods affords additional information about complex chemical mixtures. Co-eluted components can be resolved thanks to the ultra-high resolving power, which also allows extracted ion chromatograms (EICs) to be used for the observation of isomers. As such datasets can be large and data...
Article
Full-text available
Crude oil is among the most complex organic mixtures found in nature. Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) provides the resolution and mass accuracy needed to analyze such complex mixtures. When mixtures contain many different components, a competitive effect within the ICR cell takes place that hampers the detect...
Article
Choosing the number of mixture components remains an elusive challenge. Model selection criteria can be either overly liberal or conservative and return poorly separated components of limited practical use. We formalize non‐local priors (NLPs) for mixtures and show how they lead to well‐separated components with non‐negligible weight, interpretable...
Preprint
Full-text available
We study the effect and interplay of two important issues on Bayesian model selection (BMS): the presence of censoring and model misspecification. Misspecification refers to assuming the wrong model or functional effect on the response, or not recording truly relevant covariates. We focus on additive accelerated failure time (AAFT) models, as these...
Article
Full-text available
A new strategy has been developed for characterization of the most challenging complex mixtures to date, using a combination of custom-designed experiments and a new data pre-processing algorithm. In contrast to traditional methods, the approach enables operation of Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) with consta...
Technical Report
Full-text available
Choosing the number of components remains a central but elusive challenge in mixture models. Traditional model selection criteria can fail to enforce parsimony or result in poorly separated components of limited practical use. Non-local priors (NLPs) are a family of distributions that encourage parsimony by enforcing a separation between the models...
Preprint
Two key challenges in modern statistical applications are the large amount of information recorded per individual, and that such data are often not collected all at once but in batches. These batch effects can be complex, causing distortions in both mean and variance. We propose a novel sparse latent factor regression model to integrate such hetero...
Preprint
Full-text available
We develop a theoretical framework for the frequentist assessment of Bayesian model selection, specifically its ability to select the (Kullback-Leibler) optimal model and to portray the corresponding uncertainty. The contribution is not proving consistency for a specific prior, but giving a general strategy for such proofs. Its basis applies to any...
Article
Fourier transform ion cyclotron resonance mass spectrometry affords the resolving power to determine an unprecedented number of components in complex mixtures, such as petroleum. The software tools required to also analyze these data struggle to keep pace with advancing instrument capabilities and increasing quantities of data, particularly in term...
Article
MHC-I molecules expose the intracellular protein content on the cell surface, allowing T cells to detect foreign or mutated peptides. The combination of six MHC-I alleles each individual carries defines the sub-peptidome that can be effectively presented. We applied this concept to human cancer, hypothesizing that oncogenic mutations could arise in...
Article
We propose a scalable algorithmic framework for exact Bayesian variable selection and model averaging in linear models under the assumption that the Gram matrix is block-diagonal, and as a heuristic for exploring the model space for general designs. In block-diagonal designs our approach returns the most probable model of any given size without res...
Article
Full-text available
Bayesian variable selection for continuous outcomes often assumes normality, and so do its theoretical studies. There are sound reasons behind this assumption, particularly for large $p$: ease of interpretation, analytical and computational convenience. More flexible frameworks exist, including semi- or non-parametric models, often at the cost of l...
Technical Report
Full-text available
We show how to carry out fully Bayesian variable selection and model averaging in linear models when both the number of observations and covariates are large. We work under the assumption that the Gram matrix is block-diagonal. Apart from orthogonal regression and various contexts where this is satisfied by design, this framework may serve in futur...
Article
We show how to carry out fully Bayesian variable selection and model averaging in linear models when both the number of observations and covariates are large. We work under the assumption that the Gram matrix is block-diagonal. Apart from orthogonal regression and various contexts where this is satisfied by design, this framework may serve in futur...
Article
Compensatory proliferation triggered by hepatocyte loss is required for liver regeneration and maintenance but also promotes development of hepatocellular carcinoma (HCC). Despite extensive investigation, the cells responsible for hepatocyte restoration or HCC development remain poorly characterized. We used genetic lineage tracing to identify cell...
Article
Designing an RNA-seq study depends critically on its specific goals, technology and underlying biology, which renders general guidelines inadequate. We propose a Bayesian framework to customize experiments so that goals can be attained and resources are not wasted, with a focus on alternative splicing. We studied how read length, sequencing depth,...
Article
Full-text available
Recent molecular classifications of colorectal cancer (CRC) based on global gene expression profiles have defined subtypes displaying resistance to therapy and poor prognosis. Upon evaluation of these classification systems, we discovered that their predictive power arises from genes expressed by stromal cells rather than epithelial tumor cells. Bi...
Article
Full-text available
Efforts to compile the phenotypic effects of drugs and environmental chemicals offer the opportunity to adopt a chemo-centric view of human health that does not require detailed mechanistic information. Here we consider thousands of chemicals and analyse the relationship of their structures with adverse and therapeutic responses. Our study includes...
Article
Les dades massives (big data) representen un recurs sense precedents per a afrontar reptes cientifics, economics i socials, pero tambe incrementen la possibilitat de traure conclusions enganyoses. Per exemple, l’us d’enfocaments basats exclusivament en dades i que es despreocupen de comprendre el fenomen en estudi, que s’orienten a un objectiu esmu...
Article
Full-text available
Big Data brings unprecedented power to address scientific, economic and societal issues, but also amplifies the possibility of certain pitfalls. These include using purely data-driven approaches that disregard understanding the phenomenon under study, aiming at a dynamically moving target, ignoring critical data-collection issues, summarizing or pr...
Article
Full-text available
RNA-sequencing has revolutionized biomedical research and, in particular, our ability to study gene alternative splicing. The problem has important implications for human health, as alternative splicing may be involved in malfunctions at the cellular level and multiple diseases. However, the high-dimensional nature of the data and the existence of...
Article
Full-text available
Non-local priors (NLPs) possess appealing properties for high-dimensional model choice, e.g. parsimony or consistency of posterior model probabilities. Their use for estimation has not yet been studied in detail, partially due to difficulties in characterizing the posterior on the parameter space. Here we give a general representation of NLPs as mi...
Article
Full-text available
Development of tools to jointly visualize the genome and the epigenome remains a challenge. chroGPS is a computational approach that addresses this question. chroGPS uses multidimensional scaling techniques to represent similarity between epigenetic factors, or between genetic elements on the basis of their epigenetic state, in 2D/3D reference maps...
Article
Full-text available
Our goal in these analyses was to use genomic features from a test set of primary breast tumors to build an integrated transcriptome landscape model that makes relevant hypothetical predictions about the biological and/or clinical behavior of HER2-positive breast cancer. We interrogated RNA-Seq data from benign breast lesions, ER+, triple negative,...
Article
Common goals in classification problems are (i) obtaining predictions and (ii) identifying subsets of highly predictive variables. Bayesian classifiers quantify the uncertainty in all steps of the prediction. However, common Bayesian procedures can be slow in excluding features with no predictive power (Johnson & Rossell. (2010). In certain high-di...
Article
A large proportion of colorectal cancers (CRCs) display mutational inactivation of the TGF-β pathway, yet, paradoxically, they are characterized by elevated TGF-β production. Here, we unveil a prometastatic program induced by TGF-β in the microenvironment that associates with a high risk of CRC relapse upon treatment. The activity of TGF-β on strom...
Article
Full-text available
In high-throughput experiments, the sample size is typically chosen informally. Most formal sample-size calculations depend critically on prior knowledge. We propose a sequential strategy that, by updating knowledge when new data are available, depends less critically on prior assumptions. Experiments are stopped or continued based on the potential...
Article
Full-text available
H3K4me3 is a histone modification that accumulates at the transcription-start site (TSS) of active genes and is known to be important for transcription activation. The way in which H3K4me3 is regulated at TSS and the actual molecular basis of its contribution to transcription remain largely unanswered. To address these questions, we have analyzed t...
Article
Standard assumptions incorporated into Bayesian model selection procedures result in procedures that are not competitive with commonly used penalized likelihood methods. We propose modifications of these methods by imposing nonlocal prior densities on model parameters. We show that the resulting model selection procedures are consistent in linear m...
Article
Proceedings: AACR 103rd Annual Meeting 2012‐‐ Mar 31‐Apr 4, 2012; Chicago, IL Motivation: Overexpression of HER2 (the product of the ERBB2 gene) occurs in about 15% of all breast tumors. We have undertaken to use next generation transcriptome sequencing technology to identify genomic features that are unique to HER2+ tumors. Interactome mapping wa...
Article
Full-text available
KRAS mutations are highly prevalent in non-small cell lung cancer (NSCLC), and tumors harboring these mutations tend to be aggressive and resistant to chemotherapy. We used next-generation sequencing technology to identify pathways that are specifically altered in lung tumors harboring a KRAS mutation. Paired-end RNA-sequencing of 15 primary lung a...
Article
Full-text available
We provide a Bioconductor package with quality assessment, processing and visualization tools for high-throughput sequencing data, with emphasis in ChIP-seq and RNA-seq studies. It includes detection of outliers and biases, inefficient immuno-precipitation and overamplification artifacts, de novo identification of read-rich genomic regions and visu...
Article
Full-text available
Here we describe the isolation of stem cells of the human colonic epithelium. Differential cell surface abundance of ephrin type-B receptor 2 (EPHB2) allows the purification of different cell types from human colon mucosa biopsies. The highest EPHB2 surface levels correspond to epithelial colonic cells with the longest telomeres and elevated expres...
Article
Motivation: KRAS is commonly mutated in a variety of cancers including lung cancer. The KRAS gene is frequently mutated at codons 12 and 13 in lung adenocarcinomas in patients with a history of smoking. Tumors harboring an activating KRAS mutation are aggressive and are often resistant to available therapies. In the present study, we set out to ide...
Article
A frequent complication in colorectal cancer (CRC) is regeneration of the tumor after therapy. Here, we report that a gene signature specific for adult intestinal stem cells (ISCs) predicts disease relapse in CRC patients. ISCs are marked by high expression of the EphB2 receptor, which becomes gradually silenced as cells differentiate. Using EphB2...
Article
Model organisms such as the fruit fly Drosophila melanogaster can help to elucidate the molecular basis of complex diseases such as cancer. Mutations in the Drosophila gene lethal (3) malignant brain tumor cause malignant growth in the larval brain. Here we show that l(3)mbt tumors exhibited a soma-to-germline transformation through the ectopic exp...
Article
We examine philosophical problems and sampling deficiencies that are associated with current Bayesian hypothesis testing methodology, paying particular attention to objective Bayes methodology. Because the prior densities that are used to define alternative hypotheses in many Bayesian tests assign non-negligible probability to regions of the parame...
Article
Full-text available
Misfolded proteins are caused by genomic mutations, aberrant splicing events, translation errors or environmental factors. The accumulation of misfolded proteins is a phenomenon connected to several human disorders, and is managed by stress responses specific to the cellular compartments being affected. In wild-type cells these mechanisms of stress...
Article
Full-text available
Hierarchical models are a powerful tool for high-throughput data with a small to moderate number of replicates, as they allow sharing information across units of information, for example, genes. We propose two such models and show its increased sensitivity in microarray differential expression applications. We build on the gamma--gamma hierarchical...
Article
Full-text available
Heterochromatin protein 1 (HP1) proteins are conserved in eukaryotes, with most species containing several isoforms. Based on the properties of Drosophila HP1a, it was proposed that HP1s bind H3K9me2,3 and recruit factors involved in heterochromatin assembly and silencing. Yet, it is unclear whether this general picture applies to all HP1 isoforms...
Article
Full-text available
We develop an approach for microarray differential expression analysis, i.e. identifying genes whose expression levels differ between two or more groups. Current approaches to inference rely either on full parametric assumptions or on permutation-based techniques for sampling under the null distribution. In some situations, however, a full parametr...
Chapter
The subject of statistical inference under order restrictions has been studied extensively since Bartholomew’s likelihood-ratio test for means under restricted alternatives [1]. Order restrictions explicitly introduce scientific knowledge into the mathematical formulation of the problem, which can improve inference.
Article
Full-text available
We propose drug screening designs based on a Bayesian decision-theoretic approach. The discussion is motivated by screening designs for phase II studies. The proposed screening designs allow consideration of multiple treatments simultaneously. In each period, new treatments can arise and currently considered treatments can be dropped. Once a treatm...

Network

Cited By