[show abstract][hide abstract] ABSTRACT: We have developed an association-based approach using classical inbred strains of mice in which we correct for population structure, which is very extensive in mice, using an efficient mixed-model algorithm. Our approach includes inbred parental strains as well as recombinant inbred strains in order to capture loci with effect sizes typical of complex traits in mice (in the range of 5 % of total trait variance). Over the last few years, we have typed the hybrid mouse diversity panel (HMDP) strains for a variety of clinical traits as well as intermediate phenotypes and have shown that the HMDP has sufficient power to map genes for highly complex traits with resolution that is in most cases less than a megabase. In this essay, we review our experience with the HMDP, describe various ongoing projects, and discuss how the HMDP may fit into the larger picture of common diseases and different approaches.
[show abstract][hide abstract] ABSTRACT: We sought exonic transcriptional regulatory elements by shotgun cloning human cDNA fragments into luciferase reporter vectors and measuring the resulting expression levels in liver cells. We uncovered seven regulatory elements within coding regions and three within 3' untranslated regions (UTRs). Two of the putative regulatory elements were enhancers and eight were silencers. The regulatory elements were generally but not consistently evolutionarily conserved and also showed a trend toward decreased population diversity. Furthermore, the exonic regulatory elements were enriched in known transcription factor binding sites (TFBSs) and were associated with several histone modifications and transcriptionally relevant chromatin. Evidence was obtained for bidirectional cis-regulation of a coding region element within a tubulin gene, TUBA1B, by the transcription factors PPARA and RORA. We estimate that hundreds of exonic transcriptional regulatory elements exist, an unexpected finding that highlights a surprising multi-functionality of sequences in the human genome.
PLoS ONE 01/2012; 7(9):e46098. · 3.73 Impact Factor
[show abstract][hide abstract] ABSTRACT: The relationships between the gene functional similarity and gene expression profile, and between gene function annotation and gene sequence have been studied extensively. However, not much work has considered the connection between gene functions and location of a gene's expression in the mammalian tissues. On the other hand, although unsupervised learning methods have been commonly used in functional genomics, supervised learning cannot be directly applied to a set of normal genes without having a target (class) attribute.
Here, we propose a supervised learning methodology to predict pair-wise gene functional similarity from multiplex gene expression maps that provide information about the location of gene expression. The features are extracted from expression maps and the labels denote the functional similarities of pairs of genes. We make use of wavelet features, original expression values, difference and average values of neighboring voxels and other features to perform boosting analysis. The experimental results show that with increasing similarities of gene expression maps, the functional similarities are increased too. The model predicts the functional similarities between genes to a certain degree. The weights of the features in the model indicate the features that are more significant for this prediction.
By considering pairs of genes, we propose a supervised learning methodology to predict pair-wise gene functional similarity from multiplex gene expression maps. We also explore the relationship between similarities of gene maps and gene functions. By using AdaBoost coupled with our proposed weak classifier we analyze a large-scale gene expression dataset and predict gene functional similarities. We also detect the most significant single voxels and pairs of neighboring voxels and visualize them in the expression map image of a mouse brain. This work is very important for predicting functions of unknown genes. It also has broader applicability since the methodology can be applied to analyze any large-scale dataset without a target attribute and is not restricted to gene expressions.
[show abstract][hide abstract] ABSTRACT: There is only a limited understanding of the relation between copy number and expression for mammalian genes. We fine mapped cis and trans regulatory loci due to copy number change for essentially all genes using a human-hamster radiation hybrid (RH) panel. These loci are called copy number expression quantitative trait loci (ceQTLs).
Unexpected findings from a previous study of a mouse-hamster RH panel were replicated. These findings included decreased expression as a result of increased copy number for 30% of genes and an attenuated relationship between expression and copy number on the X chromosome suggesting an Xist independent form of dosage compensation. In a separate glioblastoma dataset, we found conservation of genes in which dosage was negatively correlated with gene expression. These genes were enriched in signaling and receptor activities. The observation of attenuated X-linked gene expression in response to increased gene number was also replicated in the glioblastoma dataset. Of 523 gene deserts of size > 600 kb in the human RH panel, 325 contained trans ceQTLs with -log10 P > 4.1. Recently discovered genes, ultra conserved regions, noncoding RNAs and microRNAs explained only a small fraction of the results, suggesting a substantial portion of gene deserts harbor as yet unidentified functional elements.
Radiation hybrids are a useful tool for high resolution mapping of cis and trans loci capable of affecting gene expression due to copy number change. Analysis of two independent radiation hybrid panels show agreement in their findings and may serve as a discovery source for novel regulatory loci in noncoding regions of the genome.
[show abstract][hide abstract] ABSTRACT: The relationships between the levels of transcripts and the levels of the proteins they encode have not been examined comprehensively in mammals, although previous work in plants and yeast suggest a surprisingly modest correlation. We have examined this issue using a genetic approach in which natural variations were used to perturb both transcript levels and protein levels among inbred strains of mice. We quantified over 5,000 peptides and over 22,000 transcripts in livers of 97 inbred and recombinant inbred strains and focused on the 7,185 most heritable transcripts and 486 most reliable proteins. The transcript levels were quantified by microarray analysis in three replicates and the proteins were quantified by Liquid Chromatography-Mass Spectrometry using O(18)-reference-based isotope labeling approach. We show that the levels of transcripts and proteins correlate significantly for only about half of the genes tested, with an average correlation of 0.27, and the correlations of transcripts and proteins varied depending on the cellular location and biological function of the gene. We examined technical and biological factors that could contribute to the modest correlation. For example, differential splicing clearly affects the analyses for certain genes; but, based on deep sequencing, this does not substantially contribute to the overall estimate of the correlation. We also employed genome-wide association analyses to map loci controlling both transcript and protein levels. Surprisingly, little overlap was observed between the protein- and transcript-mapped loci. We have typed numerous clinically relevant traits among the strains, including adiposity, lipoprotein levels, and tissue parameters. Using correlation analysis, we found that a low number of clinical trait relationships are preserved between the protein and mRNA gene products and that the majority of such relationships are specific to either the protein levels or transcript levels. Surprisingly, transcript levels were more strongly correlated with clinical traits than protein levels. In light of the widespread use of high-throughput technologies in both clinical and basic research, the results presented have practical as well as basic implications.
[show abstract][hide abstract] ABSTRACT: Our understanding of the genetic basis of learning and memory remains shrouded in mystery. To explore the genetic networks governing the biology of conditional fear, we used a systems genetics approach to analyze a hybrid mouse diversity panel (HMDP) with high mapping resolution.
A total of 27 behavioral quantitative trait loci were mapped with a false discovery rate of 5%. By integrating fear phenotypes, transcript profiling data from hippocampus and striatum and also genotype information, two gene co-expression networks correlated with context-dependent immobility were identified. We prioritized the key markers and genes in these pathways using intramodular connectivity measures and structural equation modeling. Highly connected genes in the context fear modules included Psmd6, Ube2a and Usp33, suggesting an important role for ubiquitination in learning and memory. In addition, we surveyed the architecture of brain transcript regulation and demonstrated preservation of gene co-expression modules in hippocampus and striatum, while also highlighting important differences. Rps15a, Kif3a, Stard7, 6330503K22RIK, and Plvap were among the individual genes whose transcript abundance were strongly associated with fear phenotypes.
Application of our multi-faceted mapping strategy permits an increasingly detailed characterization of the genetic networks underlying behavior.
BMC Systems Biology 03/2011; 5:43. · 2.98 Impact Factor
[show abstract][hide abstract] ABSTRACT: Biological networks are often modeled by random graphs. A better modeling vehicle is a multigraph where each pair of nodes is connected by a Poisson number of edges. In the current model, the mean number of edges equals the product of two propensities, one for each node. In this context it is possible to construct a simple and effective algorithm for rapid maximum likelihood estimation of all propensities. Given estimated propensities, it is then possible to test statistically for functionally connected nodes that show an excess of observed edges over expected edges. The model extends readily to directed multigraphs. Here, propensities are replaced by outgoing and incoming propensities.
The theory is applied to real data on neuronal connections, interacting genes in radiation hybrids, interacting proteins in a literature curated database, and letter and word pairs in seven Shaskespearean plays.
All data used are fully available online from their respective sites. Source code and software is available from http://code.google.com/p/poisson-multigraph/.
[show abstract][hide abstract] ABSTRACT: Using radiation hybrid genotyping data, 99% of all possible gene pairs across the mammalian genome were tested for interactions based on co-retention frequencies higher (attraction) or lower (repulsion) than chance. Gene interaction networks constructed from six independent data sets overlapped strongly. Combining the data sets resulted in a network of more than seven million interactions, almost all attractive. This network overlapped with protein-protein interaction networks on multiple measures and also confirmed the relationship between essentiality and centrality. In contrast to other biological networks, the radiation hybrid network did not show a scale-free distribution of connectivity but was Gaussian-like, suggesting a closer approach to saturation. The radiation hybrid (RH) network constitutes a platform for understanding the systems biology of the mammalian cell.
Genome Research 08/2010; 20(8):1122-32. · 14.40 Impact Factor
[show abstract][hide abstract] ABSTRACT: Oxidative modifications of protein tyrosines have been implicated in multiple human diseases. Among these modifications, elevations in levels of 3,4-dihydroxyphenylalanine (DOPA), a major product of hydroxyl radical addition to tyrosine, has been observed in a number of pathologies. Here we report the first proteome survey of endogenous site-specific modifications, i.e. DOPA and its further oxidation product dopaquinone in mouse brain and heart tissues. Results from LC-MS/MS analyses included 50 and 14 DOPA-modified tyrosine sites identified from brain and heart, respectively, whereas only a few nitrotyrosine-containing peptides, a more commonly studied marker of oxidative stress, were detectable, suggesting the much higher abundance for DOPA modification as compared with tyrosine nitration. Moreover, 20 and 12 dopaquinone-modified peptides were observed from brain and heart, respectively; nearly one-fourth of these peptides were also observed with DOPA modification on the same sites. For both tissues, these modifications are preferentially found in mitochondrial proteins with metal binding properties, consistent with metal-catalyzed hydroxyl radical formation from mitochondrial superoxide and hydrogen peroxide. These modifications also link to a number of mitochondrially associated and other signaling pathways. Furthermore, many of the modification sites were common sites of previously reported tyrosine phosphorylation, suggesting potential disruption of signaling pathways. Collectively, the results suggest that these modifications are linked with mitochondrially derived oxidative stress and may serve as sensitive markers for disease pathologies.
[show abstract][hide abstract] ABSTRACT: Parkinson's disease (PD) is characterized by dopaminergic neurodegeneration in the nigrostriatal region of the brain; however, the neurodegeneration extends well beyond dopaminergic neurons. To gain a better understanding of the molecular changes relevant to PD, we applied two-dimensional LC-MS/MS to comparatively analyze the proteome changes in four brain regions (striatum, cerebellum, cortex, and the rest of brain) using a MPTP-induced PD mouse model with the objective to identify potential nigrostriatal-specific and other region-specific protein abundance changes. The combined analyses resulted in the identification of 4,895 nonredundant proteins with at least two unique peptides per protein. The relative abundance changes in each analyzed brain region were estimated based on the spectral count information. A total of 518 proteins were observed with substantial MPTP-induced abundance changes across different brain regions. A total of 270 of these proteins were observed with specific changes occurring either only in the striatum and/or in the rest of the brain region that contains substantia nigra, suggesting that these proteins are associated with the underlying nigrostriatal pathways. Many of the proteins that exhibit changes were associated with dopamine signaling, mitochondrial dysfunction, the ubiquitin system, calcium signaling, the oxidative stress response, and apoptosis. A set of proteins with either consistent change across all brain regions or with changes specific to the cortex and cerebellum regions were also detected. Ubiquitin specific protease (USP9X), a deubiquination enzyme involved in the protection of proteins from degradation and promotion of the TGF-beta pathway, exhibited altered abundance in all brain regions. Western blot validation showed similar spatial changes, suggesting that USP9X is potentially associated with neurodegeneration. Together, this study for the first time presents an overall picture of proteome changes underlying both nigrostriatal pathways and other brain regions potentially involved in MPTP-induced neurodegeneration. The observed molecular changes provide a valuable reference resource for future hypothesis-driven functional studies of PD.
Journal of Proteome Research 02/2010; 9(3):1496-509. · 5.06 Impact Factor
[show abstract][hide abstract] ABSTRACT: Voxelation creates expression atlases by high-throughput analysis of spatially registered cubes or voxels harvested from the brain. The modality independence of voxelation allows a variety of bioanalytical techniques to be used to map abundance. Protein expression patterns in the brain can be obtained using liquid chromatography (LC) combined with mass spectrometry (MS). Here we describe the methodology of voxelation as it pertains particularly to LC–MS proteomic analysis: sample preparation, instrumental set up and analysis, peptide identification and protein relative abundance quantitation. We also briefly describe some of the advantages, limitations and insights into the brain that can be obtained using combined proteomic and transcriptomic maps.
[show abstract][hide abstract] ABSTRACT: Oxidative modifications of protein tyrosines have been implicated in multiple human diseases. Among these modifications, elevations in levels of 3, 4-dihydroxyphenylalanine (DOPA), a major product of hydroxyl radical addition to tyrosine, has been observed in a number of pathologies. Here we report the first global proteome survey of endogenous site-specific modifications, i.e, DOPA and its further oxidation product dopaquinone (DQ) in mouse brain and heart tissues. Results from LC-MS/MS analyses included 203 and 71 DOPA-modified tyrosine sites identified from brain and heart, respectively, with a false discovery rate of ~1%; while only a few nitrotyrosine containing peptides, a more commonly studied marker of oxidative stress, were detectable, suggesting the much higher abundance for DOPA modification as compared with tyrosine nitration. Moreover, 57 and 29 DQ modified peptides were observed from brain and heart, respectively; nearly half of these peptides were also observed with DOPA modification on the same sites. For both tissues, these modifications are preferentially found in mitochondrial proteins with metal-binding properties, consistent with metal catalyzed hydroxyl radical formation from mitochondrial superoxide and hydrogen peroxide. These modifications also link to a number of mitochondria-associated and other signaling pathways. Furthermore, many of the modification sites were common sites of previously reported tyrosine phosphorylation suggesting potential disruption of signaling pathways. Structural aspects of DOPA-modified tyrosine sequences are distinct from those of nitrotyrosines suggesting that each type of modifications provides a marker for different in vivo reactive chemistries and can be used to predict sensitive protein targets. Collectively, the results suggest that these modifications are linked with mitochondrially-derived oxidative stress, and may serve as sensitive markers for disease pathologies.
[show abstract][hide abstract] ABSTRACT: Gene expression profiles have been widely used in functional genomic studies. However, not much work in traditional gene expression profiling takes into account the location information of a gene's expressions in the brain. Gene expression maps, which contain spatial information regarding the expression of genes in mice's brain, are obtained by combining voxelation and microarrays. Based on the idea that genes with similar gene expression maps may have similar gene functions, we propose an approach to identify gene functions. A gene function can potentially be associated with a specific gene expression profile. We name this specific gene expression profile, Functional Expression Profile (FEP). A functional expression profile can be obtained either by directly finding genes with a certain function, or by analyzing clusters of genes that have similar expression maps and similar functions. By taking advantage of the identified FEPs, we can annotate gene functions with high accuracy. Compared to the traditional K-nearest neighbor method, our approach shows higher accuracy in predicting functions. The images of FEPs are in good agreement with anatomical components of mice's brain, and provide valuable insight in terms of function prediction to biological scientists.
Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology, BCB 2010, Niagara Falls, NY, USA, August 2-4, 2010; 01/2010
[show abstract][hide abstract] ABSTRACT: It has been long thought that neuronal loss in Parkinson's disease (PD) is related to reactive oxygen species from mitochondrial dysfunction. However, there have been few investigations surveying both transcriptome and proteome in PD. This review focuses on recent work using microarrays and mass spectrometry to examine neurotoxicological models of PD in the mouse. Molecular pathways involved in oxidative phosphorylation, oxidative stress, apoptosis/cell death, signal transduction and neurotransmission were highlighted. Analysis of tyrosine nitration suggested that this important post-translational modification, due to conjugation of reactive oxygen species with nitric oxide, may play an important role in signal transduction as well as the molecular pathology of PD. Thus, the combined investigations highlight known pathways in PD but also point to new directions for research, implicating particularly the role of relatively understudied classes of post-translational modifications in normal cell signaling and neurological disorders.
Journal of Bioenergetics 12/2009; 41(6):487-91. · 1.60 Impact Factor
[show abstract][hide abstract] ABSTRACT: ABCG1 and ABCG4 are highly homologous members of the ATP binding cassette (ABC) transporter family that regulate cellular cholesterol homeostasis. In adult mice, ABCG1 is known to be expressed in numerous cell types and tissues, whereas ABCG4 expression is limited to the central nervous system (CNS). Here, we show significant differences in expression of these two transporters during development. Examination of beta-galactosidase-stained tissue sections from Abcg1(-/-)LacZ and Abcg4(-/-)LacZ knockin mice shows that ABCG4 is highly but transiently expressed both in hematopoietic cells and in enterocytes during development. In contrast, ABCG1 is expressed in macrophages and in endothelial cells of both embryonic and adult liver. We also show that ABCG1 and ABCG4 are both expressed as early as E12.5 in the embryonic eye and developing CNS. Loss of both ABCG1 and ABCG4 results in accumulation in the retina and/or brain of oxysterols, in altered expression of liver X receptor and sterol-regulatory element binding protein-2 target genes, and in a stress response gene. Finally, behavioral tests show that Abcg4(-/-) mice have a general deficit in associative fear memory. Together, these data indicate that loss of ABCG1 and/or ABCG4 from the CNS results in changes in metabolic pathways and in behavior.
The Journal of Lipid Research 08/2009; 51(1):169-81. · 4.39 Impact Factor
[show abstract][hide abstract] ABSTRACT: Meiotic mapping of quantitative trait loci regulating expression (eQTLs) has allowed the construction of gene networks. However, the limited mapping resolution of these studies has meant that genotype data are largely ignored, leading to undirected networks that fail to capture regulatory hierarchies. Here we use high resolution mapping of copy number eQTLs (ceQTLs) in a mouse-hamster radiation hybrid (RH) panel to construct directed genetic networks in the mammalian cell. The RH network covering 20,145 mouse genes had significant overlap with, and similar topological structures to, existing biological networks. Upregulated edges in the RH network had significantly more overlap than downregulated. This suggests repressive relationships between genes are missed by existing approaches, perhaps because the corresponding proteins are not present in the cell at the same time and therefore unlikely to interact. Gene essentiality was positively correlated with connectivity and betweenness centrality in the RH network, strengthening the centrality-lethality principle in mammals. Consistent with their regulatory role, transcription factors had significantly more outgoing edges (regulating) than incoming (regulated) in the RH network, a feature hidden by conventional undirected networks. Directed RH genetic networks thus showed concordance with pre-existing networks while also yielding information inaccessible to current undirected approaches.
[show abstract][hide abstract] ABSTRACT: Integrating quantitative proteomic and transcriptomic datasets promises valuable insights in unraveling the molecular mechanisms of the brain. We concentrate on recent studies using mass spectrometry and microarray data to investigate transcript and protein abundance in normal and diseased neural tissues. Highlighted are dual spatial maps of these molecules obtained using voxelation of the mouse brain. We demonstrate that the relationship between transcript and protein levels displays a specific anatomical distribution, with greatest fidelity in midline structures and the hypothalamus. Genes are also identified that have strong correlations between mRNA and protein abundance. In addition, transcriptomic and proteomic analysis of mouse models of Parkinson's disease are discussed.
Expert Review of Proteomics 07/2009; 6(3):243-9. · 3.90 Impact Factor
[show abstract][hide abstract] ABSTRACT: Gene expression signatures in the mammalian brain hold the key to understanding neural development and neurological disease. Researchers have previously used voxelation in combination with microarrays for acquisition of genome-wide atlases of expression patterns in the mouse brain. On the other hand, some work has been performed on studying gene functions, without taking into account the location information of a gene's expression in a mouse brain. In this paper, we present an approach for identifying the relation between gene expression maps obtained by voxelation and gene functions.
To analyze the dataset, we chose typical genes as queries and aimed at discovering similar gene groups. Gene similarity was determined by using the wavelet features extracted from the left and right hemispheres averaged gene expression maps, and by the Euclidean distance between each pair of feature vectors. We also performed a multiple clustering approach on the gene expression maps, combined with hierarchical clustering. Among each group of similar genes and clusters, the gene function similarity was measured by calculating the average gene function distances in the gene ontology structure. By applying our methodology to find similar genes to certain target genes we were able to improve our understanding of gene expression patterns and gene functions. By applying the clustering analysis method, we obtained significant clusters, which have both very similar gene expression maps and very similar gene functions respectively to their corresponding gene ontologies. The cellular component ontology resulted in prominent clusters expressed in cortex and corpus callosum. The molecular function ontology gave prominent clusters in cortex, corpus callosum and hypothalamus. The biological process ontology resulted in clusters in cortex, hypothalamus and choroid plexus. Clusters from all three ontologies combined were most prominently expressed in cortex and corpus callosum.
The experimental results confirm the hypothesis that genes with similar gene expression maps might have similar gene functions. The voxelation data takes into account the location information of gene expression level in mouse brain, which is novel in related research. The proposed approach can potentially be used to predict gene functions and provide helpful suggestions to biologists.
[show abstract][hide abstract] ABSTRACT: We performed an unbiased experimental search for enhancers and silencers in a 153-kb region containing the human apolipoprotein (APO) E/C1/C4/C2 gene cluster using shotgun cloning into a luciferase vector. A continuum of transcriptional effect sizes was observed, possibly explaining the limited success of bioinformatics in identifying regulatory regions. We identified nine statistically significant enhancers and five silencers functional in either liver or astrocyte cells, including two previously known enhancers. Only two of the fourteen elements contained conserved noncoding sequences. Within the coding sequence of the APOE gene we identified an enhancer for the E4 allele associated with Alzheimer's disease, but not E3. The single nucleotide polymorphism (SNP) causing the E4/E3 amino acid substitution was responsible for these variations, potentially explaining the higher expression levels of E4. Our results suggest a wider variety of mammalian transcriptional regulatory sequences than is currently recognized and that these may include coding region SNPs.