-
[show abstract]
[hide abstract]
ABSTRACT: With the rapid accumulation of our knowledge on diseases, disease-related genes and drug targets, network-based analysis plays an increasingly important role in systems biology, systems pharmacology and translational science. The new release of VisANT aims to provide new functions to facilitate the convenient network analysis of diseases, therapies, genes and drugs. With improved understanding of the mechanisms of complex diseases and drug actions through network analysis, novel drug methods (e.g., drug repositioning, multi-target drug and combination therapy) can be designed. More specifically, the new update includes (i) integrated search and navigation of disease and drug hierarchies; (ii) integrated disease-gene, therapy-drug and drug-target association to aid the network construction and filtering; (iii) annotation of genes/drugs using disease/therapy information; (iv) prediction of associated diseases/therapies for a given set of genes/drugs using enrichment analysis; (v) network transformation to support construction of versatile network of drugs, genes, diseases and therapies; (vi) enhanced user interface using docking windows to allow easy customization of node and edge properties with build-in legend node to distinguish different node type. VisANT is freely available at: http://visant.bu.edu.
Nucleic Acids Research 05/2013; · 8.03 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: A host of data on genetic variation from the Human Genome and International HapMap projects, and advances in high-throughput genotyping technologies, have made genome-wide association (GWA) studies technically feasible. GWA studies help in the discovery and quantification of the genetic components of disease risks, many of which have not been unveiled before and have opened a new avenue to understanding disease, treatment, and prevention.This chapter presents an overview of GWA, an important tool for discovering regions of the genome that harbor common genetic variants to confer susceptibility for various diseases or health outcomes in the post-Human Genome Project era. A tutorial on how to conduct a GWA study and some practical challenges specifically related to the GWA design is presented, followed by a detailed GWA case study involving the identification of loci associated with glioma as an example and an illustration of current technologies.
Methods in molecular biology (Clifton, N.J.) 01/2013; 939:233-51.
-
[show abstract]
[hide abstract]
ABSTRACT: BACKGROUND: Molecular markers based on gene expression profiles have been used in experimental and clinical settings to distinguish cancerous tumors in stage, grade, survival time, metastasis, and drug sensitivity. However, most significant gene markers are unstable (not reproducible) among data sets. We introduce a standardized method for representing cancer markers as 2-level hierarchical feature vectors, with a basic gene level as well as a second level of (more stable) pathway markers, for the purpose of discriminating cancer subtypes. This extends standard gene expression arrays with new pathway-level activation features obtained directly from off-the-shelf gene set enrichment algorithms such as GSEA. Such so-called pathway-based expression arrays are significantly more reproducible across datasets. Such reproducibility will be important for clinical usefulness of genomic markers, and augment currently accepted cancer classification protocols. RESULTS: The present method produced more stable (reproducible) pathway-based markers for discriminating breast cancer metastasis and ovarian cancer survival time. Between two datasets for breast cancer metastasis, the intersection of standard significant gene biomarkers totaled 7.47% of selected genes, compared to 17.65% using pathway-based markers; the corresponding percentages for ovarian cancer datasets were 20.65% and 33.33% respectively. Three pathways, consisting of Type_1_diabetes mellitus, Cytokine-cytokine_receptor_interaction and Hedgehog_signaling (all previously implicated in cancer), are enriched in both the ovarian long survival and breast non-metastasis groups. In addition, integrating pathway and gene information, we identified five (ID4, ANXA4, CXCL9, MYLK, FBXL7) and six (SQLE, E2F1, PTTG1, TSTA3, BUB1B, MAD2L1) known cancer genes significant for ovarian and breast cancer respectively. CONCLUSIONS: Standardizing the analysis of genomic data in the process of cancer staging, classification and analysis is important as it has implications for both pre-clinical as well as clinical studies. The paradigm of diagnosis and prediction using pathway-based biomarkers as features can be an important part of the process of biomarker-based cancer analysis, and the resulting canonical (clinically reproducible) biomarkers can be important in standardizing genomic data. We expect that identification of such canonical biomarkers will improve clinical utility of high-throughput datasets for diagnostic and prognostic applications. Reviewers This article was reviewed by John McDonald (nominated by I. King Jordon), Eugene Koonin, Nathan Bowen (nominated by I, King Jordon), and Ekaterina Kotelnikova (nominated by Mikhail Gelfand).
Biology Direct 07/2012; 7(1):21. · 4.02 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: A central goal of biology is understanding and describing the molecular basis of plasticity: the sets of genes that are combinatorially selected by exogenous and endogenous environmental changes, and the relations among the genes. The most viable current approach to this problem consists of determining whether sets of genes are connected by some common theme, e.g. genes from the same pathway are overrepresented among those whose differential expression in response to a perturbation is most pronounced. There are many approaches to this problem, and the results they produce show a fair amount of dispersion, but they all fall within a common framework consisting of a few basic components. We critically review these components, suggest best practices for carrying out each step, and propose a voting method for meeting the challenge of assessing different methods on a large number of experimental data sets in the absence of a gold standard.
Briefings in Bioinformatics 09/2011; 13(3):281-91. · 5.20 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Glioblastoma multiforme (GBM) tends to occur between the ages of 45 and 70. This relatively early onset and its poor prognosis make the impact of GBM on public health far greater than would be suggested by its relatively low frequency. Tissue and blood samples have now been collected for a number of populations, and predisposing alleles have been sought by several different genome-wide association (GWA) studies. The Cancer Genome Atlas (TCGA) at NIH has also collected a considerable amount of data. Because of the low concordance between the results obtained using different populations, only 14 predisposing single nucleotide polymorphism (SNP) candidates in five genomic regions have been replicated in two or more studies. The purpose of this paper is to present an improved approach to biomarker identification.
Association analysis was performed with control of population stratifications using the EIGENSTRAT package, under the null hypothesis of "no association between GBM and control SNP genotypes," based on an additive inheritance model. Genes that are strongly correlated with identified SNPs were determined by linkage disequilibrium (LD) or expression quantitative trait locus (eQTL) analysis. A new approach that combines meta-analysis and pathway enrichment analysis identified additional genes.
(i) A meta-analysis of SNP data from TCGA and the Adult Glioma Study identifies 12 predisposing SNP candidates, seven of which are reported for the first time. These SNPs fall in five genomic regions (5p15.33, 9p21.3, 1p21.2, 3q26.2 and 7p15.3), three of which have not been previously reported. (ii) 25 genes are strongly correlated with these 12 SNPs, eight of which are known to be cancer-associated. (iii) The relative risk for GBM is highest for risk allele combinations on chromosomes 1 and 9. (iv) A combined meta-analysis/pathway analysis identified an additional four genes. All of these have been identified as cancer-related, but have not been previously associated with glioma. (v) Some SNPs that do not occur reproducibly across populations are in reproducible (invariant) pathways, suggesting that they affect the same biological process, and that population discordance can be partially resolved by evaluating processes rather than genes.
We have uncovered 29 glioma-associated gene candidates; 12 of them known to be cancer related (p = 1. 4 × 10-6), providing additional statistical support for the relevance of the new candidates. This additional information on risk loci is potentially important for identifying Caucasian individuals at risk for glioma, and for assessing relative risk.
BMC Medical Genomics 08/2011; 4:63. · 3.69 Impact Factor
-
Richard J Roberts,
Yi-Chien Chang,
Zhenjun Hu,
John N Rachlin,
Brian P Anton,
Revonda M Pokrzywa,
Han-Pil Choi,
Lina L Faller,
Jyotsna Guleria,
Genevieve Housman, [......],
Lais Osmani,
Rajeswari Swaminathan,
Kevin R Tao,
Stan Letovsky,
Dennis Vitkup,
Daniel Segrè,
Steven L Salzberg, Charles Delisi,
Martin Steffen,
Simon Kasif
[show abstract]
[hide abstract]
ABSTRACT: COMBREX (http://combrex.bu.edu) is a project to increase the speed of the functional annotation of new bacterial and archaeal genomes. It consists of a database of functional predictions produced by computational biologists and a mechanism for experimental biochemists to bid for the validation of those predictions. Small grants are available to support successful bids.
Nucleic Acids Research 01/2011; 39(Database issue):D11-4. · 8.03 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: We develop a general method to identify gene networks from pair-wise correlations between genes in a microarray data set and apply it to a public prostate cancer gene expression data from 69 primary prostate tumors. We define the degree of a node as the number of genes significantly associated with the node and identify hub genes as those with the highest degree. The correlation network was pruned using transcription factor binding information in VisANT (http://visant.bu.edu/) as a biological filter. The reliability of hub genes was determined using a strict permutation test. Separate networks for normal prostate samples, and prostate cancer samples from African Americans (AA) and European Americans (EA) were generated and compared. We found that the same hubs control disease progression in AA and EA networks. Combining AA and EA samples, we generated networks for low low (<7) and high (≥7) Gleason grade tumors. A comparison of their major hubs with those of the network for normal samples identified two types of changes associated with disease: (i) Some hub genes increased their degree in the tumor network compared to their degree in the normal network, suggesting that these genes are associated with gain of regulatory control in cancer (e.g. possible turning on of oncogenes). (ii) Some hubs reduced their degree in the tumor network compared to their degree in the normal network, suggesting that these genes are associated with loss of regulatory control in cancer (e.g. possible loss of tumor suppressor genes). A striking result was that for both AA and EA tumor samples, STAT5a, CEBPB and EGR1 are major hubs that gain neighbors compared to the normal prostate network. Conversely, HIF-lα is a major hub that loses connections in the prostate cancer network compared to the normal prostate network. We also find that the degree of these hubs changes progressively from normal to low grade to high grade disease, suggesting that these hubs are master regulators of prostate cancer and marks disease progression. STAT5a was identified as a central hub, with ~120 neighbors in the prostate cancer network and only 81 neighbors in the normal prostate network. Of the 120 neighbors of STAT5a, 57 are known cancer related genes, known to be involved in functional pathways associated with tumorigenesis. Our method is general and can easily be extended to identify and study networks associated with any two phenotypes.
Genome informatics. International Conference on Genome Informatics 07/2010; 24(1):139-53.
-
[show abstract]
[hide abstract]
ABSTRACT: To identify a robust panel of microRNA signatures that can classify tumor from normal kidney using microRNA expression levels. Mounting evidence suggests that microRNAs are key players in essential cellular processes and that their expression pattern can serve as diagnostic biomarkers for cancerous tissues.
We selected 28 clear-cell type human renal cell carcinoma (ccRCC), samples from patient-matched specimens to perform high-throughput, quantitative real-time polymerase chain reaction analysis of microRNA expression levels. The data were subjected to rigorous statistical analyses and hierarchical clustering to produce a discrete set of microRNAs that can robustly distinguish ccRCC from their patient-matched normal kidney tissue samples with high confidence.
Thirty-five microRNAs were found that can robustly distinguish ccRCC from their patient-matched normal kidney tissue samples with high confidence. Among this set of 35 signature microRNAs, 26 were found to be consistently downregulated and 9 consistently upregulated in ccRCC relative to normal kidney samples. Two microRNAs, namely, MiR-155 and miR-21, commonly found to be upregulated in other cancers, and miR-210, induced by hypoxia, were also identified as overexpressed in ccRCC in our study. MicroRNAs identified as downregulated in our study can be correlated to common chromosome deletions in ccRCC.
Our analysis is a comprehensive, statistically relevant study that identifies the microRNAs dysregulated in ccRCC, which can serve as the basis of molecular markers for diagnosis.
Urology 04/2010; 75(4):835-41. · 2.43 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: One of the important challenges to post-genomic biology is relating observed phenotypic alterations to the underlying collective alterations in genes. Current inferential methods, however, invariably omit large bodies of information on the relationships between genes. We present a method that takes account of such information - expressed in terms of the topology of a correlation network - and we apply the method in the context of current procedures for gene set enrichment analysis.
Genome biology 02/2010; 11(2):R23. · 6.63 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Surprising correlations between human disease phenotypes are emerging. Recent work now reveals startling phenotype connections between species, which could provide new disease models.
Genome biology 01/2010; 11(4):116. · 6.63 Impact Factor
-
Genome Informatics. 01/2010;
-
[show abstract]
[hide abstract]
ABSTRACT: We integrate 16 genomic features to construct an evidence-weighted functional-linkage network comprising 21,657 human genes. The functional-linkage network is used to prioritize candidate genes for 110 diseases, and to reliably disclose hidden associations between disease pairs having dissimilar phenotypes, such as hypercholesterolemia and Alzheimer's disease. Many of these disease-disease associations are supported by epidemiology, but with no previous genetic basis. Such associations can drive novel hypotheses on molecular mechanisms of diseases and therapies.
Genome biology 10/2009; 10(9):R91. · 6.63 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Despite its wide usage in biological databases and applications, the role of the gene ontology (GO) in network analysis is usually limited to functional annotation of genes or gene sets with auxiliary information on correlations ignored. Here, we report on new capabilities of VisANT--an integrative software platform for the visualization, mining, analysis and modeling of the biological networks--which extend the application of GO in network visualization, analysis and inference. The new VisANT functions can be classified into three categories. (i) Visualization: a new tree-based browser allows visualization of GO hierarchies. GO terms can be easily dropped into the network to group genes annotated under the term, thereby integrating the hierarchical ontology with the network. This facilitates multi-scale visualization and analysis. (ii) Flexible annotation schema: in addition to conventional methods for annotating network nodes with the most specific functional descriptions available, VisANT also provides functions to annotate genes at any customized level of abstraction. (iii) Finding over-represented GO terms and expression-enriched GO modules: two new algorithms have been implemented as VisANT plugins. One detects over-represented GO annotations in any given sub-network and the other finds the GO categories that are enriched in a specified phenotype or perturbed dataset. Both algorithms take account of network topology (i.e. correlations between genes based on various sources of evidence). VisANT is freely available at http://visant.bu.edu.
Nucleic Acids Research 06/2009; 37(Web Server issue):W115-21. · 8.03 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: We identified significantly hypermethylated genes in clear cell renal cell carcinoma.
We previously identified a set of under expressed genes in renal cell carcinoma tissue through transcriptional profiling and a robust computational screen. We selected 19 of these genes for hypermethylation analysis using a rigorous search for the best candidate regions, considering CpG islands and transcription factor binding sites. The genes were analyzed for hypermethylation in the DNA of 38 matched clear cell renal cell carcinoma and normal samples using matrix assisted laser desorption ionization time-of-flight mass spectrometry. The significance of hypermethylation was assessed using 3 statistical tests. We validated the down-regulation of significantly hypermethylated genes at the RNA and protein levels in a separate set of patients using reverse transcriptase-polymerase chain reaction, immunohistochemistry and Western blots.
We found 7 significantly hypermethylated regions from 6 down-regulated genes, including SFRP1, which was previously shown to be hypermethylated in renal cell carcinoma and other cancer types.
To our knowledge we report for the first time that another 5 genes (SCNN1B, SYT6, DACH1, and the tumor suppressors TFAP2A and MT1G) are hypermethylated in renal cell carcinoma. Robust computational screens and the high throughput methylation assay resulted in an enriched set of novel genes that are epigenetically altered in clear cell renal cell carcinoma. Overall the detection of hypermethylation in these highly down-regulated genes suggests that assaying for their methylation using cells from urine or blood could provide the basis for a viable diagnostic test.
The Journal of urology 09/2008; 180(3):1126-30. · 4.02 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The essence of a living cell is adaptation to a changing environment, and a central goal of modern cell biology is to understand adaptive change under normal and pathological conditions. Because the number of components is large, and processes and conditions are many, visual tools are useful in providing an overview of relations that would otherwise be far more difficult to assimilate. Historically, representations were static pictures, with genes and proteins represented as nodes, and known or inferred correlations between them (links) represented by various kinds of lines. The modern challenge is to capture functional hierarchies and adaptation to environmental change, and to discover pathways and processes embedded in known data, but not currently recognizable. Among the tools being developed to meet this challenge is VisANT (freely available at http://visant.bu.edu) which integrates, mines and displays hierarchical information. Challenges to integrating modeling (discrete or continuous) and simulation capabilities into such visual mining software are briefly discussed.
Briefings in Bioinformatics 08/2008; 9(4):317-25. · 5.20 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Information obtained from diverse data sources can be combined in a principled manner using various machine learning methods to increase the reliability and range of knowledge about protein function. The result is a weighted functional linkage network (FLN) in which linked neighbors share at least one function with high probability. Precision is, however, low. Aiming to provide precise functional annotation for as many proteins as possible, we explore and propose a two-step framework for functional annotation (1) construction of a high-coverage and reliable FLN via machine learning techniques (2) development of a decision rule for the constructed FLN to optimize functional annotation.
We first apply this framework to Saccharomyces cerevisiae. In the first step, we demonstrate that four commonly used machine learning methods, Linear SVM, Linear Discriminant Analysis, Naïve Bayes, and Neural Network, all combine heterogeneous data to produce reliable and high-coverage FLNs, in which the linkage weight more accurately estimates functional coupling of linked proteins than use individual data sources alone. In the second step, empirical tuning of an adjustable decision rule on the constructed FLN reveals that basing annotation on maximum edge weight results in the most precise annotation at high coverages. In particular at low coverage all rules evaluated perform comparably. At coverage above approximately 50%, however, they diverge rapidly. At full coverage, the maximum weight decision rule still has a precision of approximately 70%, whereas for other methods, precision ranges from a high of slightly more than 30%, down to 3%. In addition, a scoring scheme to estimate the precisions of individual predictions is also provided. Finally, tests of the robustness of the framework indicate that our framework can be successfully applied to less studied organisms.
We provide a general two-step function-annotation framework, and show that high coverage, high precision annotations can be achieved by constructing a high-coverage and reliable FLN via data integration followed by applying a maximum weight decision rule.
BMC Bioinformatics 02/2008; 9:119. · 2.75 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: An important goal in post-genomic research is discovering the network of interactions between transcription factors (TFs) and the genes they regulate. We have previously reported the development of a supervised-learning approach to TF target identification, and used it to predict targets of 104 transcription factors in yeast. We now include a new sequence conservation measure, expand our predictions to include 59 new TFs, introduce a web-server, and implement an improved ranking method to reveal the biological features contributing to regulation. The classifiers combine 8 genomic datasets covering a broad range of measurements including sequence conservation, sequence overrepresentation, gene expression, and DNA structural properties.
(1) Application of the method yields an amplification of information about yeast regulators. The ratio of total targets to previously known targets is greater than 2 for 11 TFs, with several having larger gains: Ash1(4), Ino2(2.6), Yaf1(2.4), and Yap6(2.4). (2) Many predicted targets for TFs match well with the known biology of their regulators. As a case study we discuss the regulator Swi6, presenting evidence that it may be important in the DNA damage response, and that the previously uncharacterized gene YMR279C plays a role in DNA damage response and perhaps in cell-cycle progression. (3) A procedure based on recursive-feature-elimination is able to uncover from the large initial data sets those features that best distinguish targets for any TF, providing clues relevant to its biology. An analysis of Swi6 suggests a possible role in lipid metabolism, and more specifically in metabolism of ceramide, a bioactive lipid currently being investigated for anti-cancer properties. (4) An analysis of global network properties highlights the transcriptional network hubs; the factors which control the most genes and the genes which are bound by the largest set of regulators. Cell-cycle and growth related regulators dominate the former; genes involved in carbon metabolism and energy generation dominate the latter.
Postprocessing of regulatory-classifier results can provide high quality predictions, and feature ranking strategies can deliver insight into the regulatory functions of TFs. Predictions are available at an online web-server, including the full transcriptional network, which can be analyzed using VisAnt network analysis suite.
Biology Direct 02/2008; 3:22. · 4.02 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: An important goal in bioinformatics is to unravel the network of transcription factors (TFs) and their targets. This is important in the human genome, where many TFs are involved in disease progression. Here, classification methods are applied to identify new targets for 152 transcriptional regulators using publicly-available targets as training examples. Three types of sequence information are used: composition, conservation, and overrepresentation.
Starting with 8817 TF-target interactions we predict an additional 9333 targets for 152 TFs. Randomized classifiers make few predictions (approximately 2/18660) indicating that our predictions for many TFs are significantly enriched for true targets. An enrichment score is calculated and used to filter new predictions.Two case-studies for the TFs OCT4 and WT1 illustrate the usefulness of our predictions: Many predicted OCT4 targets fall into the Wnt-pathway. This is consistent with known biology as OCT4 is developmentally related and Wnt pathway plays a role in early development. Beginning with 15 known targets, 354 predictions are made for WT1. WT1 has a role in formation of Wilms' tumor. Chromosomal regions previously implicated in Wilms' tumor by cytological evidence are statistically enriched in predicted WT1 targets. These findings may shed light on Wilms' tumor progression, suggesting that the tumor progresses either by loss of WT1 or by loss of regions harbouring its targets. Targets of WT1 are statistically enriched for cancer related functions including metastasis and apoptosis. Among new targets are BAX and PDE4B, which may help mediate the established anti-apoptotic effects of WT1. Of the thirteen TFs found which co-regulate genes with WT1 (p < or = 0.02), 8 have been previously implicated in cancer. The regulatory-network for WT1 targets in genomic regions relevant to Wilms' tumor is provided.
We have assembled a set of features for the targets of human TFs and used them to develop classifiers for the determination of new regulatory targets. Many predicted targets are consistent with the known biology of their regulators, and new targets for the Wilms' tumor regulator, WT1, are proposed. We speculate that Wilms' tumor development is mediated by chromosomal rearrangements in the location of WT1 targets.
Biology Direct 01/2008; 3:24. · 4.02 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Abstract
Background
Information obtained from diverse data sources can be combined in a principled manner using various machine learning methods to increase the reliability and range of knowledge about protein function. The result is a weighted functional linkage network (FLN) in which linked neighbors share at least one function with high probability. Precision is, however, low. Aiming to provide precise functional annotation for as many proteins as possible, we explore and propose a two-step framework for functional annotation (1) construction of a high-coverage and reliable FLN via machine learning techniques (2) development of a decision rule for the constructed FLN to optimize functional annotation.
Results
We first apply this framework to Saccharomyces cerevisiae . In the first step, we demonstrate that four commonly used machine learning methods, Linear SVM, Linear Discriminant Analysis, Naïve Bayes, and Neural Network, all combine heterogeneous data to produce reliable and high-coverage FLNs, in which the linkage weight more accurately estimates functional coupling of linked proteins than use individual data sources alone. In the second step, empirical tuning of an adjustable decision rule on the constructed FLN reveals that basing annotation on maximum edge weight results in the most precise annotation at high coverages. In particular at low coverage all rules evaluated perform comparably. At coverage above approximately 50%, however, they diverge rapidly. At full coverage, the maximum weight decision rule still has a precision of approximately 70%, whereas for other methods, precision ranges from a high of slightly more than 30%, down to 3%. In addition, a scoring scheme to estimate the precisions of individual predictions is also provided. Finally, tests of the robustness of the framework indicate that our framework can be successfully applied to less studied organisms.
Conclusion
We provide a general two-step function-annotation framework, and show that high coverage, high precision annotations can be achieved by constructing a high-coverage and reliable FLN via data integration followed by applying a maximum weight decision rule.
BMC Bioinformatics. 01/2008;
-
[show abstract]
[hide abstract]
ABSTRACT: Abstract
Background
An important goal in post-genomic research is discovering the network of interactions between transcription factors (TFs) and the genes they regulate. We have previously reported the development of a supervised-learning approach to TF target identification, and used it to predict targets of 104 transcription factors in yeast. We now include a new sequence conservation measure, expand our predictions to include 59 new TFs, introduce a web-server, and implement an improved ranking method to reveal the biological features contributing to regulation. The classifiers combine 8 genomic datasets covering a broad range of measurements including sequence conservation, sequence overrepresentation, gene expression, and DNA structural properties.
Principal Findings
(1) Application of the method yields an amplification of information about yeast regulators. The ratio of total targets to previously known targets is greater than 2 for 11 TFs, with several having larger gains: Ash1(4), Ino2(2.6), Yaf1(2.4), and Yap6(2.4).
(2) Many predicted targets for TFs match well with the known biology of their regulators. As a case study we discuss the regulator Swi6, presenting evidence that it may be important in the DNA damage response, and that the previously uncharacterized gene YMR279C plays a role in DNA damage response and perhaps in cell-cycle progression.
(3) A procedure based on recursive-feature-elimination is able to uncover from the large initial data sets those features that best distinguish targets for any TF, providing clues relevant to its biology. An analysis of Swi6 suggests a possible role in lipid metabolism, and more specifically in metabolism of ceramide, a bioactive lipid currently being investigated for anti-cancer properties.
(4) An analysis of global network properties highlights the transcriptional network hubs; the factors which control the most genes and the genes which are bound by the largest set of regulators. Cell-cycle and growth related regulators dominate the former; genes involved in carbon metabolism and energy generation dominate the latter.
Conclusion
Postprocessing of regulatory-classifier results can provide high quality predictions, and feature ranking strategies can deliver insight into the regulatory functions of TFs. Predictions are available at an online web-server, including the full transcriptional network, which can be analyzed using VisAnt network analysis suite.
Reviewers
This article was reviewed by Igor Jouline, Todd Mockler(nominated by Valerian Dolja), and Sandor Pongor.
Biology Direct. 01/2008;