About
445
Publications
76,683
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
33,972
Citations
Citations since 2017
Introduction
Additional affiliations
July 2015 - present
January 2013 - July 2015
October 2007 - December 2012
Education
May 2002 - March 2006
Publications
Publications (445)
Long non-coding ribonucleic acids (lncRNAs) account for the largest group of non-coding RNAs. However, knowledge about their function and regulation is limited. lncHUB2 is a web server database that provides known and inferred knowledge about the function of 18 705 human and 11 274 mouse lncRNAs. lncHUB2 produces reports that contain the secondary...
Background
Gene-gene co-expression correlations measured by mRNA-sequencing (RNA-seq) can be used to predict gene annotations based on the co-variance structure within these data. In our prior work, we showed that uniformly aligned RNA-seq co-expression data from thousands of diverse studies is highly predictive of both gene annotations and protein...
The Common Fund Data Ecosystem (CFDE) has created a flexible system of data federation that enables researchers to discover datasets from across the US National Institutes of Health Common Fund without requiring that data owners move, reformat, or rehost those data. This system is centered on a catalog that integrates detailed descriptions of biome...
Lyme disease (LD) is tick-borne disease whose post-treatment sequelae are not well understood. For this study, we enrolled 152 individuals with symptoms of post-treatment LD (PTLD) to profile their peripheral blood mononuclear cells (PBMCs) with RNA sequencing (RNA-seq). Combined with RNA-seq data from 72 individuals with acute LD and 44 uninfected...
The phenotype of a cell and its underlying molecular state is strongly influenced by extracellular signals, including growth factors, hormones, and extracellular matrix proteins. While these signals are normally tightly controlled, their dysregulation leads to phenotypic and molecular states associated with diverse diseases. To develop a detailed u...
Birth defects are functional and structural abnormalities that impact 1 in 33 births in the United States. Birth defects have been attributed to genetic as well as other factors, but for most birth defects there are no known causes. Small molecule drugs, cosmetics, foods, and environmental pollutants may cause birth defects when the mother is expos...
The L1000 technology, a cost-effective high-throughput transcriptomics technology, has been applied to profile a collection of human cell lines for their gene expression response to > 30,000 chemical and genetic perturbations. In total, there are currently over 3 million available L1000 profiles. Such a dataset is invaluable for the discovery of dr...
A major limitation in the use of mouse models in breast cancer research is that most mice develop estrogen receptor‐alpha (ERα)‐negative mammary tumors, while in humans, the majority of breast cancers are ERα‐positive. Therefore, developing mouse models that best mimic the disease in humans is of fundamental need. Here, using an inducible MMTV‐rtTA...
There are only a few platforms that integrate multiple omics data types, bioinformatics tools, and interfaces for integrative analyses and visualization that do not require programming skills. Here we present iLINCS ( http://ilincs.org ), an integrative web-based platform for analysis of omics data and signatures of cellular perturbations. The plat...
The Library of Integrated Network‐based Cellular Signatures (LINCS) was an NIH Common Fund program that aimed to expand our knowledge about human cellular responses to chemical, genetic, and microenvironment perturbations. Responses to perturbations were measured by transcriptomics, proteomics, cellular imaging, and other high content assays. The s...
Millions of transcriptome samples were generated by the Library of Integrated Network-based Cellular Signatures (LINCS) program. When these data are processed into searchable signatures along with signatures extracted from Genotype-Tissue Expression (GTEx) and Gene Expression Omnibus (GEO), connections between drugs, genes, pathways and diseases ca...
Chronic wounds present a major disease burden in people with recessive dystrophic epidermolysis bullosa (RDEB), an inherited blistering skin disorder caused by mutations in COL7A1 encoding type VII collagen, the major component of anchoring fibrils at the dermal‐epidermal junction. Treatment of RDEB wounds is mostly symptomatic and there is conside...
Pluripotent stem-cell-derived cardiomyocytes (PSC-CMs) provide an unprecedented opportunity to study human heart development and disease, but they are functionally and structurally immature. Here, we induce efficient human PSC-CM (hPSC-CM) maturation through metabolic-pathway modulations. Specifically, we find that peroxisome-proliferator-associate...
Motivation
Many biological and biomedical researchers commonly search for information about genes and drugs to gather knowledge from these resources. For the most part, such information is served as landing pages in disparate data repositories and web portals.
Results
The Gene and Drug Landing Page Aggregator (GDLPA) provides users with access to...
Background
PubMed contains millions of abstracts that co-mention terms that describe drugs with other biomedical terms such as genes or diseases. Unique opportunities exist for leveraging these co-mentions by integrating them with other drug-drug similarity resources such as the Library of Integrated Network-based Cellular Signatures (LINCS) L1000...
Motivation
The identification of pathways and biological processes from differential gene expression is central for interpretation of data collected by transcriptomics assays. Gene-Set Enrichment Analysis (GSEA) is the most common used algorithm to calculate the significance of the relevancy of an annotated gene set with a differential expression s...
The Illuminating the Druggable Genome (IDG) consortium is a National Institutes of Health (NIH) Common Fund program designed to enhance our knowledge of under-studied proteins, more specifically, proteins unannotated within the three most commonly drug-targeted protein families: G-protein coupled receptors, ion channels, and protein kinases. Since...
Introduction: Inflammation is a key driver of atherosclerosis. The identification of new treatments and strategies targeting the specific inflammatory signaling that persists in optimally treated atherosclerotic patients remain an unmet clinical need.
Hypothesis: The integration of time-of-flight mass cytometry (CyTOF), gene expression analysis, an...
The Common Fund Data Ecosystem has created a flexible system of data federation that enables users to discover datasets from across the Common Fund without requiring the data owners to move, reformat, or rehost those data. The CFDEs federation system is centered on a metadata catalog that ingests metadata from individual Common Fund Program Data Co...
The National Cancer Institute (NCI) Clinical Proteomic Tumor Analysis Consortium (CPTAC) initiative has generated extensive phosphoproteomics and proteomics data for tumor and tumoradjacent normal tissue across multiple cancer types. This dataset provides an unprecedented opportunity to systematically characterize pan-cancer kinase activities, whic...
Loss of fatty acid β-oxidation (FAO) in the proximal tubule is a critical mediator of acute kidney injury and eventual fibrosis. However, transcriptional mediators of FAO in proximal tubule injury remain understudied. Krüppel-like factor 15 (KLF15), a highly enriched zinc-finger transcription factor in the proximal tubule , was significantly reduce...
The phenotype of a cell and its underlying molecular state is strongly influenced by extracellular signals, including growth factors, hormones, and extracellular matrix. While these signals are normally tightly controlled, their dysregulation leads to phenotypic and molecular states associated with diverse diseases. To develop a detailed understand...
Lyme disease (also known as Lyme borreliosis) is the most common vector-borne disease in the United States with an estimated 476,000 cases per year. While historically, the long-term impact of Lyme disease on patients has been controversial, mounting evidence supports the idea that a substantial number of patients experience persistent symptoms fol...
Pluripotent stem cell-derived cardiomyocytes (PSC-CMs) provide an unprecedented opportunity to study human heart development and disease. A major caveat however is that they remain functionally and structurally immature in culture, limiting their potential for disease modeling and regenerative approaches. Here, we address the question of how differ...
Significance
The kidney proximal tubule is particularly susceptible to acute injury, which results in loss of fatty acid oxidation (FAO), their primary energy source. Here, we show that loss of the transcription factor KLF6 specifically in the proximal tubule in mice protects against acute injury and fibrosis, with preservation of transcripts that...
Phosphoproteomics and proteomics experiments capture a global snapshot of the cellular signaling network, but these methods do not directly measure kinase state. Kinase Enrichment Analysis 3 (KEA3) is a webserver application that infers overrepresentation of upstream kinases whose putative substrates are in a user-inputted list of proteins. KEA3 ca...
Fibrosis occurs when collagen deposition and fibroblast proliferation replace healthy tissue. Red light (RL) may improve skin fibrosis via photobiomodulation, the process by which photosensitive chromophores in cells absorb visible or near-infrared light and undergo photophysical reactions. Our previous research demonstrated that high fluence RL re...
Understanding the underlying molecular and structural similarities between seemingly heterogeneous sets of drugs can aid in identifying drug repurposing opportunities and assist in the discovery of novel properties of preclinical small molecules. A wealth of information about drug and small molecule structure, targets, indications and side effects;...
Cell fate decisions during development are governed by multi-factorial regulatory mechanisms including chromatin remodeling, DNA methylation, binding of transcription factors to specific loci, RNA transcription and protein synthesis. However, the mechanisms by which such regulatory “dimensions” coordinate cell fate decisions are currently poorly un...
Although widely prevalent, Lyme disease is still under-diagnosed and misunderstood. Here we followed 73 acute Lyme disease patients and uninfected controls over a period of a year. At each visit, RNA-sequencing was applied to profile patients' peripheral blood mononuclear cells in addition to extensive clinical phenotyping. Based on the projection...
Jupyter Notebooks have transformed the communication of data analysis pipelines by facilitating a modular structure that brings together code, markdown text, and interactive visualizations. Here, we extended Jupyter Notebooks to broaden their accessibility with Appyters. Appyters turn Jupyter Notebooks into fully functional standalone web-based bio...
Profiling samples from patients, tissues, and cells with genomics, transcriptomics, epigenomics, proteomics, and metabolomics ultimately produces lists of genes and proteins that need to be further analyzed and integrated in the context of known biology. Enrichr (Chen et al., 2013; Kuleshov et al., 2016) is a gene set search engine that enables the...
While hundreds of genes have been associated with pain, much of the molecular mechanisms of pain remain unknown. As a result, current analgesics are limited to few clinically validated targets. Here, we trained a machine learning (ML) ensemble model to predict new targets for 17 categories of pain. The model utilizes features from transcriptomics,...
Gene co-expression correlations from mRNA-sequencing (RNAseq) can be used to predict gene function based on the covariance structure that exists within such data. In the past, we showed that RNA-seq co-expression data is highly predictive of gene function and protein-protein interactions. We demonstrated that the performance of such predictions is...
Diabetic kidney disease (DKD) remains the most common cause of kidney failure, and the treatment options are insufficient. Here, we used a connectivity mapping approach to first collect 15 gene expression signatures from 11 DKD-related published independent studies. Then, by querying the Library of Integrated Network-based Cellular Signatures (LINC...
The choreography of complex immune responses, including the priming, differentiation, and modulation of specific effector T cell populations generated in the immediate wake of an acute pathogen challenge, is in part controlled by chemokines, a large family of mostly secreted molecules involved in chemotaxis and other patho/physiological processes....
Pathogen-specific memory T cells (TM) contribute to enhanced immune protection under conditions of reinfection, and their effective recruitment into a recall response relies, in part, on cues imparted by chemokines that coordinate their spatiotemporal positioning. An integrated perspective, however, needs to consider TM as a potentially relevant ch...
The tumor microenvironment and genomic landscape of intermediate and high-risk primary localized prostate cancers are clinically heterogeneous and result in variable treatment response in individuals. Cancer-specific alterations at DNA and RNA level is a critical driver of intra-tumoral heterogeneity that significantly impacts the molecular process...
In an effort to interfere with the biology of SARS-CoV-2, the virus responsible for the COVID-19 pandemic, we focused on restoring the transcriptional response induced by infection. Utilizing expression patterns of SARS-CoV-2-infected cells, we identified a region in gene expression space that was unique to virus infection and inversely proportiona...
Glioblastoma (GBM) is the most aggressive primary brain tumor. In addition to being genetically heterogeneous, GBMs are also immunologically heterogeneous. However, whether the differences in immune microenvironment are driven by genetic driver mutation is unexplored. By leveraging the versatile RCAS/tv‐a somatic gene transfer system, we establish...
In a short period, many research publications that report sets of experimentally validated drugs as potential COVID-19 therapies have emerged. To organize this accumulating knowledge, we developed the COVID-19 Drug and Gene Set Library (https://amp.pharm.mssm.edu/covid19/), a collection of drug and gene sets related to COVID-19 research from multip...
Advancements in regenerative medicine have brought to the fore the need for increased standardization and sharing of stem cell product characterization to help drive these innovative interventions toward public availability. Although numerous attempts have been made to store this data, there is still a lack of a platform that incorporates heterogen...
Rapid progress in proteomics and large-scale profiling of biological systems at the protein level necessitates the continued development of efficient computational tools for the analysis and interpretation of proteomics data. Here, we present the piNET server that facilitates integrated annotation, analysis and visualization of quantitative proteom...
The coronavirus (CoV) severe acute respiratory syndrome (SARS)-CoV-2 (COVID-19) pandemic has received rapid response by the research community to offer suggestions for repurposing of approved drugs as well as to improve our understanding of the COVID-19 viral life cycle molecular mechanisms. In a short period, tens of thousands of research preprint...
Motivation:
Micro-blogging with Twitter to communicate new results, discuss ideas, and share techniques is becoming central. While most Twitter users are real people, the Twitter API provides the opportunity to develop Twitter bots and to analyze global trends in tweets.
Results:
EnrichrBot is a bot that tracks and tweets information about human...
Hematopoietic stem cells (HSCs) exist in a dormant state and progressively lose regenerative potency as they undergo successive divisions. Why this functional decline occurs and how this information is encoded is unclear. To better understand how this information is stored, we performed RNA sequencing on HSC populations differing only in their divi...
Genetic variants are the primary driver of congenital heart disease (CHD) pathogenesis. However, our ability to identify causative variants is limited. To identify causal CHD genes that are associated with specific molecular functions, the study used prior knowledge to filter de novo variants from 2,881 probands with sporadic severe CHD. This appro...
RNA-Sequencing (RNA-Seq) is currently the leading technology for genome-wide transcript quantification. Mapping the raw reads to transcript and gene level counts can be achieved by different aligners. Here we report an in-depth comparison of transcript quantification methods. Our goal is the specific use of cost-efficient RNA-Seq analysis for deplo...
The choreography of complex immune responses, including the priming, differentiation, and modulation of specific effector T cell populations generated in the immediate wake of an acute pathogen challenge, is in part controlled by chemokines, a large family of mostly secreted molecules involved in chemotaxis and other patho/physiological processes....
Niche relocated by muscle contraction
Regulation of adult stem cells by their microenvironment, or niche, is essential for tissue homeostasis and for regeneration after injury and during aging. Normal regression of hair follicles during the hair cycle poses a particular challenge for maintaining a functional proximity of stem cells to their niche,...
Hematopoietic stem cells (HSCs) are maintained by bone marrow (BM) niches in vivo, but the ability of niche cells to maintain HSCs ex vivo is markedly diminished. Expression of niche factors (Scf, Cxcl12, Vcam1 and Angpt1) by Nestin-GFP+ mesenchymal-derived stem cells (MSCs) is downregulated upon culture and lose its effect of maintaining HSC in vi...
The Library of Integrated Network-Based Cellular Signatures (LINCS) is an NIH Common Fund program with the goal of generating a large-scale and comprehensive catalogue of perturbation-response signatures by utilizing a diverse collection of perturbations across many model systems and assay types. The LINCS Data Portal (LDP) has been the primary acc...
iLINCS (http://ilincs.org) is an integrative web-based platform for analysis of omics data and signatures of cellular perturbations. The portal facilitates analysis of user-submitted omics signatures of diseases and cellular perturbations in the context of a large compendium of pre-computed signatures (>200,000), as well as mining and re-analysis o...
Atherosclerosis is driven by multifaceted contributions of the immune system within the circulation and at vascular focal sites. However, specific characteristics of dysregulated immune cells within atherosclerotic lesions that lead to clinical events such as ischemic stroke or myocardial infarction are poorly understood. Here, using single-cell pr...
Diabetes is far more prevalent in smokers than non-smokers, but the underlying mechanisms of vulnerability are unknown. Here we show that the diabetes-associated gene Tcf7l2 is densely expressed in the medial habenula (mHb) region of the rodent brain, where it regulates the function of nicotinic acetylcholine receptors. Inhibition of TCF7L2 signall...
As more digital resources are produced by the research community, it is becoming increasingly important to harmonize and organize them for synergistic utilization. The findable, accessible, interoperable, and reusable (FAIR) guiding principles have prompted many stakeholders to consider strategies for tackling this challenge. The FAIRshake toolkit...
Atherosclerosis is driven by multifaceted contributions of the immune system within the circulation and at vascular focal sites. Yet the specific immune dysregulations within the atherosclerotic lesions that lead to clinical cerebro- and cardiovascular complications (i.e. ischemic stroke and myocardial infarction) are poorly understood. Here, using...
Hematopoietic stem cells (HSCs) are maintained by bone marrow (BM) niches in vivo, but the ability of niche cells to maintain HSCs ex vivo is markedly diminished. Expression of niche factors by Nestin-GFP+ mesenchymal-derived stem cells (MSCs) is downregulated upon culture, suggesting that transcriptional rewiring may contribute to this reduced HSC...
Connectivity mapping resources consist of signatures representing changes in cellular state following systematic small-molecule, disease, gene, or other form of perturbations. Such resources enable the characterization of signatures from novel perturbations based on similarity; provide a global view of the space of many themed perturbations; and al...
Evidence that some high-impact biomedical results cannot be repeated has stimulated interest in practices that generate findable, accessible, interoperable, and reusable (FAIR) data. Multiple papers have identified specific examples of irreproducibility, but practical ways to make data more reproducible have not been widely studied. Here, five rese...