Xiaoyu Jiang’s research while affiliated with Biogen and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (8)


Supplementary Material 3
  • Data
  • File available

October 2017

·

3 Reads

·

Xiaoyu Jiang
Download

Fig. 1 Ratio of overexpressed vs. underexpressed unique DEGs in microarray datasets vs. the literature. |FC| > n indicates microarray DEGs with absolute fold change above n 
Table 1 Percentage of overexpressed vs. underexpressed unique DEGs in microarray data and the literature. |FC| > n indicates microarray DEGs with absolute fold change above n
Fig. 2 Relation between overexpression mentions in the literature and the subset of those which are high increase. The figure shows the relation between gene overexpression mentions and mean number of high increase mentions for genes with up to nine overexpression mentions. Slope of the zero-y-intercept trend line is 0.21 and its associated r 2 is 0.89 
Fig. 6 Statistically overrepresented Gene Ontology functional classes. Top-20 statistically overrepresented Gene Ontology functional classes based on overexpressed genes in the UC literature (left) and in the UC microarray dataset (right) 
Cumulative probability of a gene being reported as overexpressed in AD given its microarray FC. The abscissa corresponds to microarray FC and the ordinate to the cumulative probability of a gene being reported as overexpressed when its associated microarray FC is above a certain value, p(overexpression in AD | FC in AD > x)

+2

Differential gene expression in disease: A comparison between high-throughput studies and the literature

October 2017

·

323 Reads

·

76 Citations

BMC Medical Genomics

Background Differential gene expression is important to understand the biological differences between healthy and diseased states. Two common sources of differential gene expression data are microarray studies and the biomedical literature. Methods With the aid of text mining and gene expression analysis we have examined the comparative properties of these two sources of differential gene expression data. Results The literature shows a preference for reporting genes associated to higher fold changes in microarray data, rather than genes that are simply significantly differentially expressed. Thus, the resemblance between the literature and microarray data increases when the fold-change threshold for microarray data is increased. Moreover, the literature has a reporting preference for differentially expressed genes that (1) are overexpressed rather than underexpressed; (2) are overexpressed in multiple diseases; and (3) are popular in the biomedical literature at large. Additionally, the degree to which diseases are similar depends on whether microarray data or the literature is used to compare them. Finally, vaguely-qualified reports of differential expression magnitudes in the literature have only small correlation with microarray fold-change data. Conclusions Reporting biases of differential gene expression in the literature can be affecting our appreciation of disease biology and of the degree of similarity that actually exists between different diseases. Electronic supplementary material The online version of this article (10.1186/s12920-017-0293-y) contains supplementary material, which is available to authorized users.






IPF-LASSO: Integrative L 1 -Penalized Regression with Penalty Factors for Prediction Based on Multi-Omics Data

May 2017

·

282 Reads

·

99 Citations

Computational and Mathematical Methods in Medicine

As modern biotechnologies advance, it has become increasingly frequent that different modalities of high-dimensional molecular data (termed “omics” data in this paper), such as gene expression, methylation, and copy number, are collected from the same patient cohort to predict the clinical outcome. While prediction based on omics data has been widely studied in the last fifteen years, little has been done in the statistical literature on the integration of multiple omics modalities to select a subset of variables for prediction, which is a critical task in personalized medicine. In this paper, we propose a simple penalized regression method to address this problem by assigning different penalty factors to different data modalities for feature selection and prediction. The penalty factors can be chosen in a fully data-driven fashion by cross-validation or by taking practical considerations into account. In simulation studies, we compare the prediction performance of our approach, called IPF-LASSO (Integrative LASSO with Penalty Factors) and implemented in the R package ipflasso , with the standard LASSO and sparse group LASSO. The use of IPF-LASSO is also illustrated through applications to two real-life cancer datasets. All data and codes are available on the companion website to ensure reproducibility.


Citations (2)


... The identification of differentially expressed genes (DEGs) through RNA-Seq analysis is an essential part of the study of biological pathways implicated in various neurological disorders. The purpose of conducting Differential Expression Gene (DEG) analysis is to identify genes that exhibit potential overexpression or underexpression in the context of a disease state, relative to a control group that remains unaffected 18 . Dysregulation of gene expression, whether it be overexpression or underexpression, can lead to disruptions in various biological pathways such as metabolic and immune pathways, which eventually result in the development of diseases 19 . ...

Reference:

A meta-analysis of bulk RNA-seq datasets identifies potential biomarkers and repurposable therapeutics against Alzheimer’s disease
Differential gene expression in disease: A comparison between high-throughput studies and the literature

BMC Medical Genomics

... For linear models, a simple solution is to employ a regularization framework in which the the clinical covariates are penalized differently (or not penalized at all) compared to the omics covariates. Examples implementing this idea are IPF-Lasso [Boulesteix et al., 2017] employing lasso penalization [Tibshirani, 1996], and multistep elastic net [Chase and Boonstra, 2019] employing elastic net penalization [Zou and Hastie, 2005]. Another linear approach is boosting ridge regression [Binder and Schumacher, 2008], in which, at each boosting step, a single covariate is updated according to a penalized likelihood criterion with a large penalty for the omics covariates and no penalty for the clinical covariates. ...

IPF-LASSO: Integrative L 1 -Penalized Regression with Penalty Factors for Prediction Based on Multi-Omics Data

Computational and Mathematical Methods in Medicine