October 2017
·
3 Reads
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
October 2017
·
3 Reads
October 2017
·
323 Reads
·
76 Citations
BMC Medical Genomics
Background Differential gene expression is important to understand the biological differences between healthy and diseased states. Two common sources of differential gene expression data are microarray studies and the biomedical literature. Methods With the aid of text mining and gene expression analysis we have examined the comparative properties of these two sources of differential gene expression data. Results The literature shows a preference for reporting genes associated to higher fold changes in microarray data, rather than genes that are simply significantly differentially expressed. Thus, the resemblance between the literature and microarray data increases when the fold-change threshold for microarray data is increased. Moreover, the literature has a reporting preference for differentially expressed genes that (1) are overexpressed rather than underexpressed; (2) are overexpressed in multiple diseases; and (3) are popular in the biomedical literature at large. Additionally, the degree to which diseases are similar depends on whether microarray data or the literature is used to compare them. Finally, vaguely-qualified reports of differential expression magnitudes in the literature have only small correlation with microarray fold-change data. Conclusions Reporting biases of differential gene expression in the literature can be affecting our appreciation of disease biology and of the degree of similarity that actually exists between different diseases. Electronic supplementary material The online version of this article (10.1186/s12920-017-0293-y) contains supplementary material, which is available to authorized users.
October 2017
·
2 Reads
October 2017
·
2 Reads
October 2017
·
2 Reads
October 2017
·
5 Reads
May 2017
·
282 Reads
·
99 Citations
Computational and Mathematical Methods in Medicine
As modern biotechnologies advance, it has become increasingly frequent that different modalities of high-dimensional molecular data (termed “omics” data in this paper), such as gene expression, methylation, and copy number, are collected from the same patient cohort to predict the clinical outcome. While prediction based on omics data has been widely studied in the last fifteen years, little has been done in the statistical literature on the integration of multiple omics modalities to select a subset of variables for prediction, which is a critical task in personalized medicine. In this paper, we propose a simple penalized regression method to address this problem by assigning different penalty factors to different data modalities for feature selection and prediction. The penalty factors can be chosen in a fully data-driven fashion by cross-validation or by taking practical considerations into account. In simulation studies, we compare the prediction performance of our approach, called IPF-LASSO (Integrative LASSO with Penalty Factors) and implemented in the R package ipflasso , with the standard LASSO and sparse group LASSO. The use of IPF-LASSO is also illustrated through applications to two real-life cancer datasets. All data and codes are available on the companion website to ensure reproducibility.
January 2017
·
15 Reads
The table displays the parameters of the additional simulation settings. See the results in Subsection 3.2.2.
... The identification of differentially expressed genes (DEGs) through RNA-Seq analysis is an essential part of the study of biological pathways implicated in various neurological disorders. The purpose of conducting Differential Expression Gene (DEG) analysis is to identify genes that exhibit potential overexpression or underexpression in the context of a disease state, relative to a control group that remains unaffected 18 . Dysregulation of gene expression, whether it be overexpression or underexpression, can lead to disruptions in various biological pathways such as metabolic and immune pathways, which eventually result in the development of diseases 19 . ...
October 2017
BMC Medical Genomics
... For linear models, a simple solution is to employ a regularization framework in which the the clinical covariates are penalized differently (or not penalized at all) compared to the omics covariates. Examples implementing this idea are IPF-Lasso [Boulesteix et al., 2017] employing lasso penalization [Tibshirani, 1996], and multistep elastic net [Chase and Boonstra, 2019] employing elastic net penalization [Zou and Hastie, 2005]. Another linear approach is boosting ridge regression [Binder and Schumacher, 2008], in which, at each boosting step, a single covariate is updated according to a penalized likelihood criterion with a large penalty for the omics covariates and no penalty for the clinical covariates. ...
May 2017
Computational and Mathematical Methods in Medicine