Thomas Girke

Thomas Girke
University of California, Riverside | UCR · Institute for Integrative Genome Biology (IIGB)

PhD

About

152
Publications
21,769
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
10,644
Citations
Additional affiliations
January 2003 - December 2015
University of California, Riverside
Position
  • Professor of Bioinformatics

Publications

Publications (152)
Article
Full-text available
The health benefits of switching from tobacco to electronic cigarettes (ECs) are neither confirmed nor well characterized. To address this problem, we used RNA-seq analysis to compare the nasal epithelium transcriptome from the following groups (n = 3 for each group): (1) former smokers who completely switched to second generation ECs for at least...
Article
Understanding how roots modulate development under varied irrigation or rainfall is crucial for development of climate-resilient crops. We established a toolbox of tagged rice lines to profile translating mRNAs and chromatin accessibility within specific cell populations. We used these to study roots in a range of environments: plates in the lab, c...
Article
Full-text available
signatureSearch is an R/Bioconductor package that integrates a suite of existing and novel algorithms into an analysis environment for gene expression signature (GES) searching combined with functional enrichment analysis (FEA) and visualization methods to facilitate the interpretation of the search results. In a typical GES search (GESS), a query...
Article
Full-text available
Aging is the dominant risk factor for most chronic diseases. Development of antiaging interventions offers the promise of preventing many such illnesses simultaneously. Cellular stress resistance is an evolutionarily conserved feature of longevity. Here, we identify compounds that induced resistance to the superoxide generator paraquat (PQ), the he...
Article
HOXA Transcript Antisense RNA, Myeloid-Specific 1 (HOTAIRM1) is a conserved long non-coding RNA (lncRNA) involved in myeloid and neural differentiation that is deregulated in acute myeloid leukemia and other cancers. Previous studies focused on the nuclear unspliced HOTAIRM1 transcript, however cytoplasmic splice variants exist whose roles have rem...
Article
Full-text available
Importance No previous studies have shown that acute inhalation of thirdhand smoke (THS) activates stress and survival pathways in the human nasal epithelium. Objective To evaluate gene expression in the nasal epithelium of nonsmoking women following acute inhalation of clean air and THS. Design, Setting, and Participants Nasal epithelium samples...
Article
Full-text available
Extensive genome-wide analyses of deregulated gene expression have now been performed for many types of cancer. However, most studies have focused on deregulation at the gene-level, which may overlook the alterations of specific transcripts for a given gene. Clear cell renal cell carcinoma (ccRCC) is one of the bestcharacterized and most pervasive...
Article
Full-text available
The potato aphid, Macrosiphum euphorbiae, is an important agricultural pest that causes economic losses to potato and tomato production. To establish the transcriptome for this aphid, RNA-Seq libraries constructed from aphids maintained on tomato plants were used in Illumina sequencing generating 52.6 million 75–105 bp paired-end reads. The reads w...
Data
Annotation of the Macrosiphum euphorbiae transcriptome. Annotation was performed using BLASTx analysis against NCBI’s non-redundant protein database and UniProt database. (XLSX)
Data
Buchnera aphidicola sequences identified among the Macrosiphum euphorbiae transcriptome. Annotation was performed by BLASTx analysis against the UniProt database. (XLS)
Article
Full-text available
Lean body mass, consisting mostly of skeletal muscle, is important for healthy aging. We performed a genome-wide association study for whole body (20 cohorts of European ancestry with n = 38,292) and appendicular (arms and legs) lean body mass (n = 28,330) measured using dual energy X-ray absorptiometry or bioelectrical impedance analysis, adjusted...
Article
Full-text available
A correction to this article has been published and is linked from the HTML version of this article.
Article
A correction to this article has been published and is linked from the HTML version of this article.
Article
The Longevity Genomics Research Project is designed to create a publicly-available research resource available through its website (www.longevitygenomics.org) to enable scientists to develop translational strategies to promote human longevity and healthy aging. Our resource will consist of software tools as R packages, curated datasets, and project...
Article
The mosquito Aedes aegypti is a major vector of numerous viral diseases, because it requires a blood meal to facilitate egg development. The fat body, a counterpart of mammalian liver and adipose tissues, is the metabolic center, playing a key role in reproduction. Therefore, understanding of regulatory networks controlling its functions is critica...
Article
Full-text available
This study presents an analysis of the small molecule bioactivity profiles across large quantities of diverse protein families represented in PubChem BioAssay. We compared the bioactivity profiles of FDA approved drugs to non-FDA approved compounds, and report several distinct patterns characteristic of the approved drugs. We found that a large fra...
Data
Fully screened sub-matrix, error rate, selectivity by molecular size, and stretched exponentials. This text contains more analysis details on the fully screened sub-matrix we provide as a downloadable reference, an algebraic estimate of error rates, our target selectivity by molecular size analysis, and a more in-depth discussion and methods for th...
Data
Target selectivity by molecular size. Violin plot with horizontal lines drawn at the 0.25, 0.5, 0.75 quantiles with tails trimmed to the range of data, as described in the “Target Selectivity by Molecular Size” section of S1 Text. Molecule size is quantified here by the number of non-hydrogen (heavy) atoms. (A) Target selectivity vs. molecular size...
Data
Selectivity distribution. The distribution of cluster selectivity counts for non-FDA approved compounds as shown in Fig 4, along with best fit lines using two-parameter versions of the exponential, power law, and stretched exponential functions, as described in the “Stretched Exponential Selectivity Distribution” section of S1 Text. The stretched e...
Data
Target selectivity distribution among targets sharing a common protein domain. The distribution of active and tested targets for FDA approved and non-FDA approved compounds within targets sharing a common Pfam domain, as described in the “Target Selectivity Distribution Among Targets Sharing a Common Protein Domain” section of S1 Text. See Table V...
Data
Protein target biclusters. This is a zipped Excel readable tab separated text file with a representative UniProt protein identifier for each sequence-similar target cluster in the first column, and a bicluster for each in the second column corresponding to the drug-target biclusters described in the text. Several targets had no UniProt translation...
Data
Drug-Target (DT) bipartite network Gene Ontology (GO). FDA approved drugs are shown in black, with protein targets show in color based on the most specific Molecular Function GO Slim term for each target. Unannotated targets are shown in white. No color key is provided, as some colors were reused in order to visualize a large number of GO terms. No...
Data
Target selectivity, cluster selectivity, domain selectivity, and promiscuity probability P(θ ≥ 0.25) for all highly screened active compounds. This is a zipped Excel readable tab separated text file with PubChem compound ids (cid) for each compound in the first column. Compounds are sorted in order from most promiscuous, to most selective. This als...
Data
Potentially novel targets for FDA-approved drugs. This is a zipped Excel readable tab separated text file with PubChem compound ids (cids) for each compound in the first column, and a representative UniProt protein target identifier for each sequence-similar target cluster in the second column. These represent compound-target pairs reported as acti...
Data
Fully screened compound vs target cluster binary matrix. This is a zipped Excel readable tab separated text file with PubChem compound ids (cid) for each compound in the first column. The first (header) line contains a unique representative UniProt identifier for each sequence-similar protein target cluster. Six targets had no UniProt translation a...
Data
Drug-Target (DT) bipartite network biclusters. Protein targets are shown in black, with FDA approved drugs shown in color, based on their bioactivity bicluster. Unclustered compounds are shown in grey. No color key is provided, as some colors were reused in order to visualize a large number of biclusters. Node position is based on connectivity, wit...
Data
List of pfam domains including median target, cluster, and domain selectivities for FDA approved and non-FDA compounds. This is a zipped Excel readable tab separated text file with Pfam identifiers for each domain in the first column. This is the full data shown in Tables 4 and 5, including non-H. sapiens domains. All domains with at least one acti...
Data
Target-protein network. This is a Gephi readable zipped GML (Graph Modeling Language) formatted file, which contains the target-protein network described in the manuscript. Each node (protein) is labeled with a GenBank GI number and a Molecular Function GO slim term. (ZIP)
Data
Pfam domain co-occurrence on protein targets. This file reports all protein domain combinations that occur together on the same protein targets among the PubChem BioAssay target set, as identified with the HMMER analysis described in the main text. This is a zipped Excel readable tab separated text file with Pfam domain ids in the first column, and...
Data
Distribution of distinct protein target assay participation. Data is included from all assay experiments in PubChem BioAssay annotated with one or more clearly defined protein targets, and reporting an active score for at least one small molecule. The dashed vertical line is drawn at 10 targets, which is the minimum value we categorize in this stud...
Data
Sensitivity of PAINS and aggregators vs promiscuity probability cutoff. The top panel shows the sensitivity (true positive rate) of PAINS and aggregators to categorize promiscuous compounds throughout a range of promiscuity probability cutoffs P(θ ≥ 0.25) > x over the range x = [0.01, 0.9999]. The bottom panel shows the number of promiscuous compou...
Data
FDA approved drug biclusters. This is a zipped Excel readable tab separated text file with PubChem compound ids (cids) for each compound in the first column, and a bicluster for each compound in the second column corresponding to the drug-target biclusters described in the text. (ZIP)
Article
Full-text available
Importance: Many aspects of VZV infection of sensory ganglia remain poorly understood, due to limited access to human specimens and the fact that VZV is strictly a human virus. Infection of rhesus macaques with simian varicella virus (SVV), a homolog of VZV, provides a robust model of the human disease. Using this model, we show that SVV reaches t...
Article
Full-text available
Varicella Zoster Virus (VZV) is the causative agent of varicella and herpes zoster. Although it is well established that VZV is transmitted via the respiratory route, the host-pathogen interactions during acute VZV infection in the lungs remain poorly understood due to limited access to clinical samples. To address these gaps in our knowledge, we l...
Article
Full-text available
Background: Next-generation sequencing (NGS) has revolutionized how research is carried out in many areas of biology and medicine. However, the analysis of NGS data remains a major obstacle to the efficient utilization of the technology, as it requires complex multi-step processing of big data demanding considerable computational expertise from us...
Article
Full-text available
Despite a large and rapidly growing body of small molecule bioactivity screens available in the public domain, systematic leverage of the data to assess target druggability and compound selectivity has been confounded by a lack of suitable cross-target analysis software. We have developed bioassayR, a computational tool which enables simultaneous a...
Article
Full-text available
Climate change has increased the frequency and severity of flooding events with significant negative impact on agricultural productivity. These events often submerge plant aerial organs and roots, limiting growth and survival due to a severe reduction in light reactions and gas exchange necessary for photosynthesis and respiration, respectively. To...
Article
Full-text available
A virus with a large genome was identified in the transcriptome of the potato aphid (Macrosiphum euphorbia) and was named Macrosiphum euphorbiae virus (MeV-1). The MeV-1 genome is 22,780 nt, including 3' and 5' non-coding regions, with a single large open reading frame encoding a putative polyprotein of 7,333 amino acids. The C-terminal region of t...
Article
Full-text available
The arthropod-specific juvenile hormone (JH) controls numerous essential functions. Its involvement in gene activation is known to be mediated by the transcription factor Methoprene-tolerant (Met), which turns on JH-controlled genes by directly binding to E-box-like motifs in their regulatory regions. However, it remains unclear how JH represses ge...
Article
Full-text available
Several lines of evidence indicate that chronic alcohol use disorder leads to increased susceptibility to several viral and bacterial infections, whereas moderate alcohol consumption decreases the incidence of colds and improves immune responses to some pathogens. In line with these observations, we recently showed that heavy ethanol intake (averag...
Article
Full-text available
In multicellular organisms, development, growth and reproduction require coordinated expression of numerous functional and regulatory genes. Insects, in addition to being the most speciose animal group with enormous biological and economical significance, represent outstanding model organisms for studying regulation of synchronized gene expression...
Article
Full-text available
Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934...
Article
Full-text available
Infection with yellow fever virus (YFV), an explosively replicating flavivirus, results in viral hemorrhagic disease characterized by cardiovascular shock and multi-organ failure. Unvaccinated populations experience 20 to 50% fatality. Few studies have examined the pathophysiological changes that occur in humans during YFV infection due to the spor...
Patent
Full-text available
The present invention relates to an improved process for the preparation of unsaturated fatty acids and to a process for the preparation of triglycerides with an increased content of unsaturated fatty acids. The invention relates to the generation of transgenic organism, preferably of a transgenic plant or of a transgenic microorganism, with an inc...
Article
Full-text available
The shoot apical meristem (SAM) acts as a reservoir for stem cells. The central zone (CZ) harbors stem cells. The stem cell progenitors differentiate in the adjacent peripheral zone and in the rib meristem located just beneath the CZ. The SAM is further divided into distinct clonal layers: the L1 epidermal, L2 sub-epidermal and L3 layers. Collectiv...
Article
Full-text available
Motivation: De novo assemblies of genomes remain one of the most challenging applications in next-generation sequencing. Usually, their results are incomplete and fragmented into hundreds of contigs. Repeats in genomes and sequencing errors are the main reasons for these complications. With the rapidly growing number of sequenced genomes, it is now...
Article
Full-text available
This article gives an overview of basic computational methods that are commonly used for analyzing small molecule screening data in the chemical genomics field. First, we introduce cheminformatic concepts for analyzing drug-like small molecule structures and their properties. Second, we introduce compound selection approaches for assembling screeni...
Article
Full-text available
Chemical genomics is a novel approach that allows for the rapid functional analysis of plant proteins, complexes, pathways, and networks. Systematic screens for bioactive small molecules causing specific subcellular phenotypes have been successfully performed in mammalian cells, but thus far, are limited in plants. This protocol describes a systema...
Article
Full-text available
Translational regulation contributes to plasticity in metabolism and growth that enables plants to survive in a dynamic environment. Here, we used the precise mapping of ribosome footprints (RFs) on mRNAs to investigate translational regulation under control and sublethal hypoxia stress conditions in seedlings of Arabidopsis thaliana. Ribosomes wer...
Patent
Full-text available
This disclosure relates to methods and compositions for modulating disease resistance in plants and transgenic plants.
Article
Full-text available
Wounding due to mechanical injury or insect feeding causes a wide array of damage to plant cells including cell disruption, desiccation, metabolite oxidation, and disruption of primary metabolism. In response, plants regulate a variety of genes and metabolic pathways to cope with injury. Tomato (Solanum lycopersicum) is a model for wound signaling...
Article
Full-text available
Protein regulation by ubiquitin has been extensively described in model organisms. However, characterization of the ubiquitin machinery in disease vectors remains mostly unknown. This fundamental gap in knowledge presents a concern because new therapeutics are needed to control vector-borne diseases, and targeting the ubiquitin machinery as a means...
Article
Full-text available
The ability to accurately measure structural similarities among small molecules is important for many analysis routines in drug discovery and chemical genomics. Algorithms used for this purpose include fragment-based fingerprint and graph-based maximum common substructure (MCS) methods. MCS approaches provide one of the most accurate similarity mea...
Article
Full-text available
RNA-Seq is increasingly being used for differential gene expression analysis which was dominated by the microarray technology in the past decade. However, inferring differential gene expression based on the observed difference of RNA-Seq read counts has unique challenges that were not present in microarray-based analysis. The differential expressio...