James B Brown

James B Brown
Lawrence Berkeley National Laboratory | LBL · Molecular Ecosystems Biology

Ph.D.

About

91
Publications
23,607
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
25,344
Citations
Additional affiliations
June 2016 - present
University of Birmingham
Position
  • Professor and Chair of Environmental Bioinformatics
October 2014 - present
University of California, Berkeley
Position
  • Professor (Associate)
September 2013 - September 2015
Lawrence Berkeley National Laboratory
Position
  • Researcher

Publications

Publications (91)
Article
Full-text available
Ecosystems at coastal terrestrial–aquatic interfaces play a significant role in global biogeochemical cycles. In this study, we aimed to characterize coastal wetlands with particular focus on the co-variability between plant dynamics, topography, soil, and other environmental factors. We proposed a functional zonation approach based on machine lear...
Preprint
Short polypeptides encoded by small open reading frames (smORFs) are ubiquitously found in eukaryotic genomes and are important regulators of physiology, development, and mitochondrial processes. Here, we focus on a subset of 194 smORFs that are evolutionarily conserved between Drosophila melanogaster and humans. Many of these smORFs are conserved...
Preprint
High resolution gridded datasets of meteorological variables are needed in order to resolve fine-scale hydrological gradients in complex mountainous terrain. Across the United States, the highest available spatial resolution of gridded datasets of daily meteorological records is approximately 800 m. This work presents gridded datasets of daily prec...
Preprint
Full-text available
Mortality rates during the COVID-19 pandemic have varied by orders of magnitude across communities in the United States. Individual, socioeconomic, and environmental factors have been linked to health outcomes of COVID-19. It is now widely appreciated that the environmental microbiome, composed of microbial communities associated with soil, water,...
Preprint
Full-text available
Background: During a pandemic, estimates of geographic variability in disease burden are important but limited by the availability and quality of data. Methods: We propose a framework for estimating geographic variability in testing effort, total number of infections, and infection fatality ratio (IFR). Because symptomatic people are more likely to...
Preprint
Full-text available
We outline emerging opportunities and challenges to enhance the utility of AI for scientific discovery. The distinct goals of AI for industry versus the goals of AI for science create tension between identifying patterns in data versus discovering patterns in the world from data. If we address the fundamental challenges associated with "bridging th...
Article
Full-text available
The gut microbiome produces vitamins, nutrients, and neurotransmitters, and helps to modulate the host immune system—and also plays a major role in the metabolism of many exogenous compounds, including drugs and chemical toxicants. However, the extent to which specific microbial species or communities modulate hazard upon exposure to chemicals rema...
Article
Transdisciplinary solutions are needed to achieve the sustainability of ecosystem services for future generations. We propose a framework to identify the causes of ecosystem function loss and to forecast the future of ecosystem services under different climate and pollution scenarios. The framework (i) applies an artificial intelligence (AI) time-s...
Article
Full-text available
Increasing occurrence of harmful algal blooms across the land–water interface poses significant risks to coastal ecosystem structure and human health. Defining significant drivers and their interactive impacts on blooms allows for more effective analysis and identification of specific conditions supporting phytoplankton growth. A novel iterative Ra...
Article
Full-text available
Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassig...
Article
Full-text available
As a means to understand human neuropsychiatric disorders from human brain samples, we compared the transcription patterns and histological features of postmortem brain to fresh human neocortex isolated immediately following surgical removal. Compared to a number of neuropsychiatric disease-associated postmortem transcriptomes, the fresh human brai...
Article
Full-text available
Despite their ubiquity in personal care products, the health implications of Titanium dioxide (TiO2) nanomaterials (NMs) are under strenuous investigation for their potential as a carcinogen, while other evidence has shown links with premature ageing. Both potential hazards are manifested after chronic exposure. To explore the chronic effects of Ti...
Article
Full-text available
Background Research around the weedkiller Roundup is among the most contentious of the twenty-first century. Scientists have provided inconclusive evidence that the weedkiller causes cancer and other life-threatening diseases, while industry-paid research reports that the weedkiller has no adverse effect on humans or animals. Much of the controvers...
Article
Full-text available
The rhizosphere microbiome (rhizobiome) plays a critical role in plant health and development. However, the processes by which the constituent microbes interact to form and maintain a community are not well understood. To investigate these molecular processes, we examined pairwise interactions between 11 different microbial isolates under select nu...
Article
Full-text available
Engineered nanoparticles (NPs) undergo physical, chemical, and biological transformation after environmental release, resulting in different properties of the “aged” versus “pristine” forms. While many studies have investigated the ecotoxicological effects of silver (Ag) NPs, the majority focus on “pristine” Ag NPs in simple exposure media, rather...
Article
Full-text available
Background Over the past few years the variety of experimental designs and protocols for sequencing experiments increased greatly. To ensure the wide usability of the produced data beyond an individual project, rich and systematic annotation of the underlying experiments is crucial. Findings We first developed an annotation structure that captures...
Article
Full-text available
Background: Daphnia species reproduce by cyclic parthenogenesis involving both sexual and asexual reproduction. The sex of the offspring is environmentally determined and mediated via endocrine signalling by the mother. Interestingly, male and female Daphnia can be genetically identical, yet display large differences in behaviour, morphology, life...
Article
There is a growing recognition that application of mechanistic approaches to understand cross-species shared molecular targets and pathway conservation in the context of hazard characterization, provide significant opportunities in risk assessment (RA) for both human health and environmental safety. Specifically, it has been recognized that a more...
Preprint
We introduce Block Sparse Canonical Correlation Analysis which estimates multiple pairs of canonical directions (together a "block") at once, resulting in significantly improved orthogonality of the sparse directions which, we demonstrate, translates to more interpretable solutions. Our approach builds on the sparse CCA method of (Solari, Brown, an...
Preprint
A new approach to the sparse Canonical Correlation Analysis (sCCA)is proposed with the aim of discovering interpretable associations in very high-dimensional multi-view, i.e.observations of multiple sets of variables on the same subjects, problems. Inspired by the sparse PCA approach of Journee et al. (2010), we also show that the sparse CCA formul...
Article
Full-text available
Agrobacterium sp. strain 33MFTa1.1 was isolated for functional host-microbe interaction studies from the Thlaspi arvense root-associated microbiome. The complete genome is comprised of a circular chromosome of 2,771,937 bp, a linear chromosome of 2,068,443 bp, and a plasmid of 496,948 bp, with G+C contents of 59%, 59%, and 58%, respectively.
Article
Full-text available
Identifying functional enhancer elements in metazoan systems is a major challenge. Large-scale validation of enhancers predicted by ENCODE reveal false-positive rates of at least 70%. We used the pregrastrula-patterning network of Drosophila melanogaster to demonstrate that loss in accuracy in held-out data results from heterogeneity of functional...
Article
Full-text available
DNA methylation is an evolutionary ancient epigenetic modification that is phylogenetically widespread. Comparative studies of the methylome across a diverse range of non-conventional and conventional model organisms is expected to help reveal how the landscape of DNA methylation and its functions have evolved. Here we explore the DNA methylation p...
Article
Full-text available
During terminal erythropoiesis, the splicing machinery in differentiating erythroblasts executes a robust intron retention (IR) program that impacts expression of hundreds of genes. We studied IR mechanisms in the SF3B1 splicing factor gene, which expresses ~50% of its transcripts in late erythroblasts as a nuclear isoform that retains intron 4. RN...
Preprint
Full-text available
During terminal erythropoiesis, the splicing machinery in differentiating erythroblasts executes a robust intron retention (IR) program that impacts expression of hundreds of genes. We studied IR mechanisms in the SF3B1 splicing factor gene, which expresses ~50% of its transcripts in late erythroblasts as a nuclear isoform that retains intron 4. RN...
Preprint
Full-text available
Identifying functional enhancers elements in metazoan systems is a major challenge. For example, large-scale validation of enhancers predicted by ENCODE reveal false positive rates of at least 70%. Here we use the pregrastrula patterning network of Drosophila melanogaster to demonstrate that loss in accuracy in held out data results from heterogene...
Article
Proper expression of the MDS-disease gene, SF3B1, ensures appropriate pre-mRNA splicing in erythroid progenitors and during terminal erythropoiesis. We previously showed that the SF3B1 gene is post-transcriptionally regulated in a differentiation stage-specific manner by intron retention (IR), such that ~50% of its transcripts in mature erythroblas...
Preprint
Full-text available
Genomics has revolutionized biology, enabling the interrogation of whole transcriptomes, genome-wide binding sites for proteins, and many other molecular processes. However, individual genomic assays measure elements that interact in vivo as components of larger molecular machines. Understanding how these high-order interactions drive gene expressi...
Article
Full-text available
Luminal epithelial cells in the breast gradually alter gene and protein expression with age, appearing to lose lineage-specificity by acquiring myoepithelial-like characteristics. We hypothesize that the luminal lineage is particularly sensitive to microenvironment changes, and age-related microenvironment changes cause altered luminal cell phenoty...
Article
Natural habitats are exposed to an increasing number of environmental stressors that cause important ecological consequences. However, the multifarious nature of environmental change, the strength and the relative timing of each stressor largely limit our understanding of biological responses to environmental change. In particular early response to...
Article
Full-text available
Although the gut microbiome plays important roles in host physiology, health and disease1, we lack understanding of the complex interplay between host genetics and early life environment on the microbial and metabolic composition of the gut. We used the genetically diverse Collaborative Cross mouse system2 to discover that early life history impact...
Article
Full-text available
Chromosomal instability (CIN) is a hallmark of cancer that contributes to tumour heterogeneity and other malignant properties. Aberrant centromere and kinetochore function causes CIN through chromosome missegregation, leading to aneuploidy, rearrangements and micronucleus formation. Here we develop a Centromere and kinetochore gene Expression Score...
Data
Supplementary Figures 1-34, Supplementary Tables 1-30, Supplementary Notes 1-5 and Supplementary References.
Data
R code and the individual breast cancer and lung cancer datasets for the Kaplan-Meier plots and forest plots in Figures 3 and 4.
Data
Differential expression of 15 CEN/KT genes is significant in cancer progression across cancer types.
Preprint
Full-text available
1 Summary Gene co-expression network differential analysis is designed to help biologists understand gene expression patterns under different conditions. We have implemented an R package called MODA (Module Differential Analysis) for gene co-expression network differential analysis. Based on transcriptomic data, MODA can be used to estimate and co...
Article
Full-text available
Evidence has emerged that suggests a link between motor deficits, obesity and many neurological disorders. However, the contributing genetic risk factors are poorly understood. Here we used the Collaborative Cross (CC), a large panel of newly inbred mice that captures 90% of the known variation among laboratory mice, to identify the genetic loci co...
Article
Full-text available
In eukaryotic cells, RNAs exist as ribonucleoprotein particles (RNPs). Despite the importance of these complexes in many biological processes including splicing, polyadenylation, stability, transportation, localization, and translation, their compositions are largely unknown. We affinity purified 20 distinct RNA binding proteins (RBPs) from culture...
Article
The modENCODE (Model Organism Encyclopedia of DNA Elements) Consortium aimed to map functional elements-including transcripts, chromatin marks, regulatory factor binding sites, and origins of DNA replication-in the model organisms Drosophila melanogaster and Caenorhabditis elegans. During its five-year span, the consortium conducted more than 2,000...
Article
Full-text available
Transcriptional regulation of protein-coding genes is increasingly well-understood on a global scale, yet no comparable information exists for long non-coding RNA (lncRNA) genes, which were recently recognized to be as numerous as protein-coding genes in mammalian genomes. We performed a genome-wide comparative analysis of the promoters of human ln...
Article
Full-text available
The transcriptome is the readout of the genome. Identifying common features in it across distant species can reveal fundamental principles. To this end, the ENCODE and modENCODE consortia have generated large amounts of matched RNA-sequencing data for human, worm and fly. Uniform processing and comprehensive annotation of these data allow compariso...
Article
Full-text available
Accurate gene model annotation of reference genomes is critical for making them useful. The modENCODE project has improved the D. melanogaster genome annotation by using deep and diverse high-throughput data. Since transcriptional activity that has been evolutionarily conserved is likely to have an advantageous function, we have performed large-sca...
Article
The identification of full length transcripts entirely from short-read RNA sequencing data (RNA-seq) remains a challenge in the annotation of genomes. Here we describe an automated pipeline for genome annotation that integrates RNA-seq and gene-boundary data sets, which we call Generalized RNA Integration Tool, or GRIT. Applying GRIT to Drosophila...
Article
Full-text available
Animal transcriptomes are dynamic, with each cell type, tissue and organ system expressing an ensemble of transcript isoforms that give rise to substantial diversity. Here we have identified new genes, transcripts and proteins using poly(A)(+) RNA sequencing from Drosophila melanogaster in cultured cell lines, dissected organ systems and under envi...
Article
Full-text available
In animals, each sequence-specific transcription factor typically binds to thousands of genomic regions in vivo. Our previous studies of 20 transcription factors show that most genomic regions bound at high levels in Drosophila blastoderm embryos are known or probable functional targets, but genomic regions occupied only at low levels have characte...
Article
Full-text available
Background Transcription factors function by binding different classes of regulatory elements. The Encyclopedia of DNA Elements (ENCODE) project has recently produced binding data for more than 100 transcription factors from about 500 ChIP-seq experiments in multiple cell types. While this large amount of data creates a valuable resource, it is non...
Article
Full-text available
Background Previous work has demonstrated that chromatin feature levels correlate with gene expression. The ENCODE project enables us to further explore this relationship using an unprecedented volume of data. Expression levels from more than 100,000 promoters were measured using a variety of high-throughput techniques applied to RNA extracted by d...
Article
Full-text available
The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign bioc...
Data
Supplementary tables. Table S1: bestbin and pseudocount results for each mark. Table S2: results of all predictions, including the correlation coefficient, P-value for the correlation, the individual correlation, and relative importance of each chromatin feature. Table S3: list of experiments used in the analysis.
Data
Full-text available
Supplementary figures. Figure S1: model diagnosis. (A) ROC curve for random forests classifier in predicting the 'on' and 'off' expression status for the CAGE PolyA+ cytosolic RNA from K562 cells. The AUC (area under the curve) is 0.95 and error rate is 9.56%. (B) Residual plot for the fitted values. The red line is the mean of residuals, which sho...
Data
Full-text available
Supplementary figures. This file contains supplementary figures.
Article
Full-text available
Data from the Encyclopedia of DNA Elements (ENCODE) project show over 9640 human genome loci classified as long noncoding RNAs (lncRNAs), yet only ∼100 have been deeply characterized to determine their role in the cell. To measure the protein-coding output from these RNAs, we jointly analyzed two recent data sets produced in the ENCODE project: tan...
Article
Full-text available
The human genome contains many thousands of long noncoding RNAs (lncRNAs). While several studies have demonstrated compelling biological and disease roles for individual examples, analytical and experimental approaches to investigate these genes have been hampered by the lack of comprehensive lncRNA annotation. Here, we present and analyze the most...
Article
Full-text available
Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, h...
Article
Full-text available
The landscape of Genomics has changed drastically in the last two decades. Increas- ingly inexpensive sequencing has shifted the primary focus from the acquisition of biological sequences to the study of biological function. Assays have been developed to study many intricacies of biological systems, and publicly available databases have given rise...