Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Background and objective: Recently, differential DNA Methylation is known to affect the regulatory mechanism of biological pathways. A pathway encompasses a set of interacting genes or gene products that altogether perform a given biological function. Pathways often encode strong methylation signatures that are capable of distinguishing biologically distinct subtypes. Even though Next Generation Sequencing techniques such as MeDIP-seq and MBD-isolated genome sequencing (MiGS) allow for genome-wide identification of clinical and biological subtypes, there is a pressing need for computational methods to compare epigenetic signatures across pathways. Methods: A novel alignment method, called DEEPAligner (Deep Encoded Epigenetic Pathway Aligner), is proposed in this paper that finds functionally consistent and topologically sound alignments of epigenetic signatures from pathway networks. A deep embedding framework is used to obtain epigenetic signatures from pathways which are then aligned for functional consistency and local topological similarity. Results: Experiments on four benchmark cancer datasets reveal epigenetic signatures that are conserved in cancer-specific and across-cancer subtypes. Conclusion: The proposed deep embedding framework obtains highly coherent signatures that are aligned for biological as well as structural orthology. Comparison with state-of-the-art network alignment methods clearly suggest that the proposed method obtains topologically and functionally more consistent alignments. Availability: http://bdbl.nitc.ac.in/DEEPAligner.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... A later logistic regression classifier, trained with the encoded latent features, was able to accurately classify cancer subtypes. Visakh et al. [22] also proposed an innovative alignment method that made use of AEs to find functionally consistent and topologically sound alignments of epigenetic signatures from pathway networks. Later, those epigenetic signatures were applied to characterise several types and subtypes of breast, lung, colorectal, and prostate cancer. ...
Article
Full-text available
https://authors.elsevier.com/a/1c0AE3KEGaD6fQ Breast cancer is the most frequent cancer in women and the second most frequent overall after lung cancer. Although the 5-year survival rate of breast cancer is relatively high, recurrence is also common which often involves metastasis with its consequent threat for patients. DNA methylation-derived databases have become an interesting primary source for supervised knowledge extraction regarding breast cancer. Unfortunately, the study of DNA methylation involves the processing of hundreds of thousands of features for every patient. DNA methylation is featured by High Dimension Low Sample Size which has shown well-known issues regarding feature selection and generation. Autoencoders (AEs) appear as a specific technique for conducting nonlinear feature fusion. Our main objective in this work is to design a procedure to summarize DNA methylation by taking advantage of AEs. Our proposal is able to generate new features from the values of CpG sites of patients with and without recurrence. Then, a limited set of relevant genes to characterize breast cancer recurrence is proposed by the application of survival analysis and a pondered ranking of genes according to the distribution of their CpG sites. To test our proposal we have selected a dataset from The Cancer Genome Atlas data portal and an AE with a single-hidden layer. The literature and enrichment analysis (based on genomic context and functional annotation) conducted regarding the genes obtained with our experiment confirmed that all of these genes were related to breast cancer recurrence.
... Additional approaches include using subnetworks from pathway interaction networks to detect dysregulated pathways by treating the detection as a feature selection task (Liu, Liu, Hao, Chen and Zhao, 2012). Another approach uses a combination of methylation and gene expression data with an autoencoder to identify dysregulated pathways by using the differential expression profiles of select genes (Visakh and Nazeer, 2018). ...
Preprint
Full-text available
We performed a comprehensive pan-cancer analysis in the Cancer Genomics Cloud of HTSeq-FPKM normalized protein coding mRNA data from 17 cancer projects in the Cancer Genome Atlas, these are Adrenal Gland, Bile Duct, Bladder, Brain, Breast, Cervix, Colorectal, Esophagus, Head and Neck, Kidney, Liver, Lung, Pancreas, Prostate, Stomach, Thyroid and Uterus. The PoTRA algorithm was applied to the normalized mRNA protein coding data and detected dysregulated pathways that can be implicated in the pathogenesis of these cancers. Then the PageRank algorithm was applied to the PoTRA results to find the most influential dysregulated pathways among all 17 cancer types. Pathway in cancer is the most common dysregulated pathway, and the MAPK signaling pathway is the most influential (PageRank score = 0.2034) while the purine metabolism pathway is the most significantly dysregulated metabolic pathway.
Article
Full-text available
Motivation: As an increasing amount of protein–protein interaction (PPI) data becomes available, their computational interpretation has become an important problem in bioinformatics. The alignment of PPI networks from different species provides valuable information about conserved subnetworks, evolutionary pathways and functional orthologs. Although several methods have been proposed for global network alignment, there is a pressing need for methods that produce more accurate alignments in terms of both topological and functional consistency. Results: In this work, we present a novel global network alignment algorithm, named ModuleAlign, which makes use of local topology information to define a module-based homology score. Based on a hierarchical clustering of functionally coherent proteins involved in the same module, ModuleAlign employs a novel iterative scheme to find the alignment between two networks. Evaluated on a diverse set of benchmarks, ModuleAlign outperforms state-of-the-art methods in producing functionally consistent alignments. By aligning Pathogen–Human PPI networks, ModuleAlign also detects a novel set of conserved human genes that pathogens preferentially target to cause pathogenesis. Availability:http://ttic.uchicago.edu/∼hashemifar/ModuleAlign.html Contact:canzar@ttic.edu or j3xu.ttic.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Article
Full-text available
Disease classification system increasingly incorporates information on pathogenic mechanisms to predict clinical outcomes and response to therapy and intervention. Technological advancements to interrogate omics (genomics, epigenomics, transcriptomics, proteomics, metabolomics, metagenomics, interactomics, etc.) provide widely open opportunities in population-based research. Molecular pathological epidemiology (MPE) represents integrative science of molecular pathology and epidemiology. This unified paradigm requires multidisciplinary collaboration between pathology, epidemiology, biostatistics, bioinformatics, and computational biology. Integration of these fields enables better understanding of etiologic heterogeneity, disease continuum, causal inference, and the impact of environment, diet, lifestyle, host factors (including genetics and immunity), and their interactions on disease evolution. Hence, the Second International MPE Meeting was held in Boston in December 2014, with aims to: (1) develop conceptual and practical frameworks; (2) cultivate and expand opportunities; (3) address challenges; and (4) initiate the effort of specifying guidelines for MPE. The meeting mainly consisted of presentations of method developments and recent data in various malignant neoplasms and tumors (breast, prostate, ovarian and colorectal cancers, renal cell carcinoma, lymphoma, and leukemia), followed by open discussion sessions on challenges and future plans. In particular, we recognized need for efforts to further develop statistical methodologies. This meeting provided an unprecedented opportunity for interdisciplinary collaboration, consistent with the purposes of the Big Data to Knowledge, Genetic Associations and Mechanisms in Oncology, and Precision Medicine Initiative of the US National Institute of Health. The MPE meeting series can help advance transdisciplinary population science and optimize training and education systems for twenty-first century medicine and public health.
Article
Full-text available
Abnormal DNA methylation is known as playing an important role in the tumorgenesis. It is helpful for distinguishing the specificity of diagnosis and therapeutic targets for cancers based on characteristics of DNA methylation patterns across cancers. High throughput DNA methylation analysis provides the possibility to comprehensively filter the epigenetics diversity across various cancers. We integrated whole-genome methylation data detected in 798 samples from seven cancers. The hierarchical clustering revealed the existence of cancer-specific methylation pattern. Then we identified 331 differentially methylated genes across these cancers, most of which (266) were specifically differential methylation in unique cancer. A DNA methylation correlation network (DMCN) was built based on the methylation correlation between these genes. It was shown the hubs in the DMCN were inclined to cancer-specific genes in seven cancers. Further survival analysis using the part of genes in the DMCN revealed high-risk group and low-risk group were distinguished by seven biomarkers (PCDHB15, WBSCR17, IGF1, GYPC, CYGB, ACTG2, and PRRT1) in breast cancer and eight biomarkers (ZBTB32, OR51B4, CCL8, TMEFF2, SALL3, GPSM1, MAGEA8, and SALL1) in colon cancer, respectively. At last, a protein-protein interaction network was introduced to verify the biological function of differentially methylated genes. It was shown that MAP3K14, PTN, ACVR1 and HCK sharing different DNA methylation and gene expression across cancers were relatively high degree distribution in PPI network. The study suggested that not only the identified cancer-specific genes provided reference for individual treatment but also the relationship across cancers could be explained by differential DNA methylation.
Article
Full-text available
limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Article
Full-text available
Postreplicative mismatch repair (MMR) increases the fidelity of DNA replication by up to three orders of magnitude, through correcting DNA polymerase errors that escaped proofreading. MMR also controls homologous recombination (HR) by aborting strand exchange between divergent DNA sequences. In recent years, MMR has also been implicated in the response of mammalian cells to DNA damaging agents. Thus, MMR-deficient cells were shown to be around 100-fold more resistant to killing by methylating agents of the S N 1type than cells with functional MMR. In the case of cisplatin, the sensitivity difference was lower, typically two-to three-fold, but was observed in all matched MMR-proficient and -deficient cell pairs. More controversial is the role of MMR in cellular response to other DNA damaging agents, such as ionizing radiation (IR), topoisomerase poisons, antimetabolites, UV radiation and DNA intercalators. The MMR-dependent DNA damage signalling pathways activated by the above agents are also ill-defined. To date, signalling cascades involving the Ataxia telangiectasia mutated (ATM), ATM-and Rad3-related (ATR), as well as the stress-activated kinases JNK/SAPK and p38␣ have been linked with methylating agent and 6-thioguanine (TG) treatments, while cisplatin damage was reported to activate the c-Abl and JNK/SAPK kinases in MMR-dependent manner. MMR defects are found in several different cancer types, both familiar and sporadic, and it is possible that the involvement of the MMR system in DNA damage signalling play an important role in transformation. The scope of this article is to provide a brief overview of the recent literature on this subject and to raise questions that could be addressed in future studies.
Article
Full-text available
Motivation: High-throughput experimental techniques have produced a large amount of protein–protein interaction (PPI) data. The study of PPI networks, such as comparative analysis, shall benefit the understanding of life process and diseases at the molecular level. One way of comparative analysis is to align PPI networks to identify conserved or species-specific subnetwork motifs. A few methods have been developed for global PPI network alignment, but it still remains challenging in terms of both accuracy and efficiency.Results: This paper presents a novel global network alignment algorithm, denoted as HubAlign, that makes use of both network topology and sequence homology information, based upon the observation that topologically important proteins in a PPI network usually are much more conserved and thus, more likely to be aligned. HubAlign uses a minimum-degree heuristic algorithm to estimate the topological and functional importance of a protein from the global network topology information. Then HubAlign aligns topologically important proteins first and gradually extends the alignment to the whole network. Extensive tests indicate that HubAlign greatly outperforms several popular methods in terms of both accuracy and efficiency, especially in detecting functionally similar proteins.Availability: HubAlign is available freely for non-commercial purposes at http://ttic.uchicago.edu/∼hashemifar/software/HubAlign.zipContact: jinboxu@gmail.comSupplementary information: Supplementary data are available at Bioinformatics online.
Article
Full-text available
In an effort to identify novel biallelically inactivated tumor suppressor genes (TSGs) in sporadic invasive and preinvasive non-small-cell lung cancer (NSCLC) genomes, we applied a comprehensive integrated multiple 'omics' approach to investigate patient-matched, paired NSCLC tumor and non-malignant parenchymal tissues. By surveying lung tumor genomes for genes concomitantly inactivated within individual tumors by multiple mechanisms, and by the frequency of disruption in tumors across multiple cohorts, we have identified a putative lung cancer TSG, Eyes Absent 4 (EYA4). EYA4 is frequently and concomitantly deleted, hypermethylated and underexpressed in multiple independent lung tumor data sets, in both major NSCLC subtypes and in the earliest stages of lung cancer. We found that decreased EYA4 expression is not only associated with poor survival in sporadic lung cancers but also that EYA4 single-nucleotide polymorphisms are associated with increased familial cancer risk, consistent with EYA4s proximity to the previously reported lung cancer susceptibility locus on 6q. Functionally, we found that EYA4 displays TSG-like properties with a role in modulating apoptosis and DNA repair. Cross-examination of EYA4 expression across multiple tumor types suggests a cell-type-specific tumorigenic role for EYA4, consistent with a tumor suppressor function in cancers of epithelial origin. This work shows a clear role for EYA4 as a putative TSG in NSCLC.Oncogene advance online publication, 7 October 2013; doi:10.1038/onc.2013.396.
Article
Full-text available
Illumina's Infinium HumanMethylation450 BeadChip arrays were used to examine genome-wide DNA methylation profiles in 22 sample pairs from colorectal cancer (CRC) and adjacent tissues and 19 colon tissue samples from cancer-free donors. We show that the methylation profiles of tumors and healthy tissue samples can be clearly distinguished from one another and that the main source of methylation variability is associated with disease status. We used different statistical approaches to evaluate the methylation data. In general, at the CpG-site level, we found that common CRC-specific methylation patterns consist of at least 15,667 CpG sites that were significantly different from either adjacent healthy tissue or tissue from cancer-free subjects. Of these sites, 10,342 were hypermethylated in CRC, and 5,325 were hypomethylated. Hypermethylated sites were common in the maximum number of sample pairs and were mostly located in CpG islands, where they were significantly enriched for differentially methylated regions known to be cancer-specific. In contrast, hypomethylated sites were mostly located in CpG shores and were generally sample-specific. Despite the considerable variability in methylation data, we selected a panel of 14 highly robust candidates showing methylation marks in genes SND1, ADHFE1, OPLAH, TLX2, C1orf70, ZFP64, NR5A2, and COL4A. This set was successfully cross-validated using methylation data from 209 CRC samples and 38 healthy tissue samples from The Cancer Genome Atlas consortium (AUC = 0.981 [95% CI: 0.9677-0.9939], sensitivity = 100% and specificity = 82%). In summary, this study reports a large number of loci with novel differential methylation statuses, some of which may serve as candidate markers for diagnostic purposes.
Article
Full-text available
Downregulation of the tight junction protein claudin 1 is a frequent event in breast cancer and is associated with recurrence, metastasis, and reduced survival, suggesting a tumor suppressor role for this protein. Tumor suppressor genes are often epigenetically silenced in cancer. Downregulation of claudin 1 via DNA promoter methylation may thus be an important determinant in breast cancer development and progression. To investigate if silencing of claudin 1 has an epigenetic etiology in breast cancer we compared gene expression and methylation data from 217 breast cancer samples and 40 matched normal samples available through the Cancer Genome Atlas (TCGA). Moreover, we analyzed claudin 1 expression and methylation in 26 breast cancer cell lines. We found that methylation of the claudin 1 promoter CpG island is relatively frequent in estrogen receptor positive (ER+) breast cancer and is associated with low claudin 1 expression. In contrast, the claudin 1 promoter was not methylated in most of the ER-breast cancers samples and some of these tumors overexpress claudin 1. In addition, we observed that the demethylating agents, azacitidine and decitabine can upregulate claudin 1 expression in breast cancer cell lines that have a methylated claudin 1 promoter. Taken together, our results indicate that DNA promoter methylation is causally associated with downregulation of claudin 1 in a subgroup of breast cancer that includes mostly ER+ tumors, and suggest that epigenetic therapy to restore claudin 1 expression might represent a viable therapeutic strategy in this subtype of breast cancer.
Article
Full-text available
Epigenetic changes have been associated with ageing and cancer. Identifying and interpreting epigenetic changes associated with such phenotypes may benefit from integration with protein interactome models. We here develop and validate a novel integrative epigenome-interactome approach to identify differential methylation interactome hotspots associated with a phenotype of interest. We apply the algorithm to cancer and ageing, demonstrating the existence of hotspots associated with these phenotypes. Importantly, we discover tissue independent age-associated hotspots targeting stem-cell differentiation pathways, which we validate in independent DNA methylation data sets, encompassing over 1000 samples from different tissue types. We further show that these pathways would not have been discovered had we used a non-network based approach and that the use of the protein interaction network improves the overall robustness of the inference procedure. The proposed algorithm will be useful to any study seeking to identify interactome hotspots associated with common phenotypes.
Article
Full-text available
Human cancers almost ubiquitously harbor epigenetic alterations. Although such alterations in epigenetic marks, including DNA methylation, are potentially heritable, they can also be dynamically altered. Given this potential for plasticity, the degree to which epigenetic changes can be subject to selection and act as drivers of neoplasia has been questioned. We carried out genome-scale analyses of DNA methylation alterations in lethal metastatic prostate cancer and created DNA methylation "cityscape" plots to visualize these complex data. We show that somatic DNA methylation alterations, despite showing marked interindividual heterogeneity among men with lethal metastatic prostate cancer, were maintained across all metastases within the same individual. The overall extent of maintenance in DNA methylation changes was comparable to that of genetic copy number alterations. Regions that were frequently hypermethylated across individuals were markedly enriched for cancer- and development/differentiation-related genes. Additionally, regions exhibiting high consistency of hypermethylation across metastases within individuals, even if variably hypermethylated across individuals, showed enrichment for cancer-related genes. Whereas some regions showed intraindividual metastatic tumor heterogeneity in promoter methylation, such methylation alterations were generally not correlated with gene expression. This was despite a general tendency for promoter methylation patterns to be strongly correlated with gene expression, particularly at regions that were variably methylated across individuals. These findings suggest that DNA methylation alterations have the potential for producing selectable driver events in carcinogenesis and disease progression and highlight the possibility of targeting such epigenome alterations for development of longitudinal markers and therapeutic strategies.
Article
Full-text available
Local network alignment is an important component of the analysis of protein-protein interaction networks that may lead to the identification of evolutionary related complexes. We present AlignNemo, a new algorithm that, given the networks of two organisms, uncovers subnetworks of proteins that relate in biological function and topology of interactions. The discovered conserved subnetworks have a general topology and need not to correspond to specific interaction patterns, so that they more closely fit the models of functional complexes proposed in the literature. The algorithm is able to handle sparse interaction data with an expansion process that at each step explores the local topology of the networks beyond the proteins directly interacting with the current solution. To assess the performance of AlignNemo, we ran a series of benchmarks using statistical measures as well as biological knowledge. Based on reference datasets of protein complexes, AlignNemo shows better performance than other methods in terms of both precision and recall. We show our solutions to be biologically sound using the concept of semantic similarity applied to Gene Ontology vocabularies. The binaries of AlignNemo and supplementary details about the algorithms and the experiments are available at: sourceforge.net/p/alignnemo.
Article
Full-text available
Pathway analysis has become the first choice for gaining insight into the underlying biology of differentially expressed genes and proteins, as it reduces complexity and has increased explanatory power. We discuss the evolution of knowledge base-driven pathway analysis over its first decade, distinctly divided into three generations. We also discuss the limitations that are specific to each generation, and how they are addressed by successive generations of methods. We identify a number of annotation challenges that must be addressed to enable development of the next generation of pathway analysis methods. Furthermore, we identify a number of methodological challenges that the next generation of methods must tackle to take advantage of the technological advances in genomics and proteomics in order to improve specificity, sensitivity, and relevance of pathway analysis.
Article
Full-text available
Gene ontology analysis has become a popular and important tool in bioinformatics study, and current ontology analyses are mainly conducted in individual gene or a gene list. However, recent molecular network analysis reveals that the same list of genes with different interactions may perform different functions. Therefore, it is necessary to consider molecular interactions to correctly and specifically annotate biological networks. Here, we propose a novel Network Ontology Analysis (NOA) method to perform gene ontology enrichment analysis on biological networks. Specifically, NOA first defines link ontology that assigns functions to interactions based on the known annotations of joint genes via optimizing two novel indexes ‘Coverage’ and ‘Diversity’. Then, NOA generates two alternative reference sets to statistically rank the enriched functional terms for a given biological network. We compare NOA with traditional enrichment analysis methods in several biological networks, and find that: (i) NOA can capture the change of functions not only in dynamic transcription regulatory networks but also in rewiring protein interaction networks while the traditional methods cannot and (ii) NOA can find more relevant and specific functions than traditional methods in different types of static networks. Furthermore, a freely accessible web server for NOA has been developed at http://www.aporc.org/noa/.
Article
Full-text available
Colorectal cancer is a complex disease resulting from somatic genetic and epigenetic alterations, including locus-specific CpG island methylation and global DNA or LINE-1 hypomethylation. Global molecular characteristics such as microsatellite instability (MSI), CpG island methylator phenotype (CIMP), global DNA hypomethylation, and chromosomal instability cause alterations of gene function on a genome-wide scale. Activation of oncogenes including KRAS, BRAF and PIK3CA affects intracellular signalling pathways and has been associated with CIMP and MSI. Traditional epidemiology research has investigated various factors in relation to an overall risk of colon and/or rectal cancer. However, colorectal cancers comprise a heterogeneous group of diseases with different sets of genetic and epigenetic alterations. To better understand how a particular exposure influences the carcinogenic and pathologic process, somatic molecular changes and tumour biomarkers have been studied in relation to the exposure of interest. Moreover, an investigation of interactive effects of tumour molecular changes and the exposures of interest on tumour behaviour (prognosis or clinical outcome) can lead to a better understanding of tumour molecular changes, which may be prognostic or predictive tissue biomarkers. These new research efforts represent 'molecular pathologic epidemiology', which is a multidisciplinary field of investigations of the inter-relationship between exogenous and endogenous (eg, genetic) factors, tumoural molecular signatures and tumour progression. Furthermore, integrating genome-wide association studies (GWAS) with molecular pathological investigation is a promising area (GWAS-MPE approach). Examining the relationship between susceptibility alleles identified by GWAS and specific molecular alterations can help elucidate the function of these alleles and provide insights into whether susceptibility alleles are truly causal. Although there are challenges, molecular pathological epidemiology has unique strengths, and can provide insights into the pathogenic process and help optimise personalised prevention and therapy. In this review, we overview this relatively new field of research and discuss measures to overcome challenges and move this field forward.
Article
Full-text available
Epigenetics has recently emerged as a critical field for studying how non-gene factors can influence the traits and functions of an organism. At the core of this new wave of research is the use of computational tools that play critical roles not only in directing the selection of key experiments, but also in formulating new testable hypotheses through detailed analysis of complex genomic information that is not achievable using traditional approaches alone. Epigenomics, which combines traditional genomics with computer science, mathematics, chemistry, biochemistry and proteomics for the large-scale analysis of heritable changes in phenotype, gene function or gene expression that are not dependent on gene sequence, offers new opportunities to further our understanding of transcriptional regulation, nuclear organization, development and disease. This article examines existing computational strategies for the study of epigenetic factors. The most important databases and bioinformatic tools in this rapidly growing field have been reviewed.
Article
Full-text available
Protein–protein interactions (PPIs) and their networks play a central role in all biological processes. Akin to the complete sequencing of genomes and their comparative analysis, complete descriptions of interactomes and their comparative analysis is fundamental to a deeper understanding of biological processes. A first step in such an analysis is to align two or more PPI networks. Here, we introduce an algorithm, IsoRank, for global alignment of multiple PPI networks. The guiding intuition here is that a protein in one PPI network is a good match for a protein in another network if their respective sequences and neighborhood topologies are a good match. We encode this intuition as an eigenvalue problem in a manner analogous to Google's PageRank method. Using IsoRank, we compute a global alignment of the Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Mus musculus, and Homo sapiens PPI networks. We demonstrate that incorporating PPI data in ortholog prediction results in improvements over existing sequence-only approaches and over predictions from local alignments of the yeast and fly networks. Previous methods have been effective at identifying conserved, localized network patterns across pairs of networks. This work takes the further step of performing a global alignment of multiple PPI networks. It simultaneously uses sequence similarity and network data and, unlike previous approaches, explicitly models the tradeoff inherent in combining them. We expect IsoRank—with its simultaneous handling of node similarity and network similarity—to be applicable across many scientific domains. • biological networks • graph isomorphism • network alignment • protein–protein interactions • functional coherence
Article
Full-text available
KEGG (Kyoto Encyclopedia of Genes and Genomes) is a knowledge base for systematic analysis of gene functions, linking genomic information with higher order functional information. The genomic information is stored in the GENES database, which is a collection of gene catalogs for all the completely sequenced genomes and some partial genomes with up-to-date annotation of gene functions. The higher order functional information is stored in the PATHWAY database, which contains graphical representations of cellular processes, such as metabolism, membrane transport, signal transduction and cell cycle. The PATHWAY database is supplemented by a set of ortholog group tables for the information about conserved subpathways (pathway motifs), which are often encoded by positionally coupled genes on the chromosome and which are especially useful in predicting gene functions. A third database in KEGG is LIGAND for the information about chemical compounds, enzyme molecules and enzymatic reactions. KEGG provides Java graphics tools for browsing genome maps, comparing two genome maps and manipulating expression maps, as well as computational tools for sequence comparison, graph comparison and path computation. The KEGG databases are daily updated and made freely available (http://www.genome.ad.jp/kegg/ ).
Article
Full-text available
PathBLAST is a network alignment and search tool for comparing protein interaction networks across species to identify protein pathways and complexes that have been conserved by evolution. The basic method searches for high-scoring alignments between pairs of protein interaction paths, for which proteins of the first path are paired with putative orthologs occurring in the same order in the second path. This technique discriminates between true- and false-positive interactions and allows for functional annotation of protein interaction pathways based on similarity to the network of another, well-characterized species. PathBLAST is now available at http://www.pathblast.org/ as a web-based query. In this implementation, the user specifies a short protein interaction path for query against a target protein–protein interaction network selected from a network database. PathBLAST returns a ranked list of matching paths from the target network along with a graphical view of these paths and the overlap among them. Target protein–protein interaction networks are currently available for Helicobacter pylori, Saccharomyces cerevisiae, Caenorhabditis elegans and Drosophila melanogaster. Just as BLAST enables rapid comparison of protein sequences between genomes, tools such as PathBLAST are enabling comparative genomics at the network level.
Article
Full-text available
To elucidate cellular machinery on a global scale, we performed a multiple comparison of the recently available protein–protein interaction networks of Caenorhabditis elegans, Drosophila melanogaster, and Saccharomyces cerevisiae. This comparison integrated protein interaction and sequence information to reveal 71 network regions that were conserved across all three species and many exclusive to the metazoans. We used this conservation, and found statistically significant support for 4,645 previously undescribed protein functions and 2,609 previously undescribed protein interactions. We tested 60 interaction predictions for yeast by two-hybrid analysis, confirming approximately half of these. Significantly, many of the predicted functions and interactions would not have been identified from sequence similarity alone, demonstrating that network comparisons provide essential biological information beyond what is gleaned from the genome. • comparative analysis • multiple alignment • protein network • yeast two-hybrid
Article
Full-text available
We present an algorithm for graph isomorphism and subgraph isomorphism suited for dealing with large graphs. A first version of the algorithm has been presented in a previous paper, where we examined its performance for the isomorphism of small and medium size graphs. The algorithm is improved here to reduce its spatial complexity and to achieve a better performance on large graphs; its features are analyzed in detail with special reference to time and memory requirements. The results of a testing performed on a publicly available database of synthetically generated graphs and on graphs relative to a real application dealing with technical drawings are presented, confirming the effectiveness of the approach, especially when working with large graphs.
Article
Next-generation sequencing has revealed that more than 50% of human cancers harbour mutations in enzymes that are involved in chromatin organization. Tumour cells not only are activated by genetic and epigenetic alterations, but also routinely use epigenetic processes to ensure their escape from chemotherapy and host immune surveillance. Hence, a growing emphasis of recent drug discovery efforts has been on targeting the epigenome, including DNA methylation and histone modifications, with several new drugs being tested and some already approved by the US Food and Drug Administration (FDA). The future will see the increasing success of combining epigenetic drugs with other therapies. As epigenetic drugs target the epigenome as a whole, these true 'genomic medicines' lessen the need for precision approaches to individualized therapies.
Article
Background: Identification of pathways that show significant difference in activity between disease and control samples have been an interesting topic of research for over a decade. Pathways so identified serve as potential indicators of aberrations in phenotype or a disease condition. Recently, epigenetic mechanisms such as DNA methylation are known to play an important role in altering the regulatory mechanism of biological pathways. It is reasonable to think that a set of genes that show significant difference in expression and methylation interact together to form a network of pathways. Existing pathway identification methods fail to capture the complex interplay between interacting pathways. Results: This paper proposes a novel framework to identify biological pathways that are dysregulated by epigenetic mechanisms. Experiments on four benchmark cancer datasets and comparison with state-of-the-art pathway identification methods reveal the effectiveness of the proposed approach. Conclusion: The proposed framework incorporates both topology and biological relationships of pathways. Comparison with state-of-the-art techniques reveals promising results. Epigenetic signatures identified from pathway interaction networks can help to advance Molecular Pathological Epidemiology (MPE) research efforts by predicting tumor molecular changes.
Article
abstract: August von Kotzebue’s drama Bruder Moritz (1791) demonstrates the impact of non-European cultures on the German discourse and expresses a transcultural consciousness in German culture around 1800. Discussing Moritz’s friendship with the Arab Omar and Moritz’s provocative attitude toward female virtue and the incest taboo, this essay shows how Omar co-constructs Moritz’s identity and empowers him to imagine a different moral and social order. Moreover this essay foregrounds the happy ending of Moritz’s refuge in the remote Pacific Islands and argues that non-European cultures evoke the generic instability of bourgeois tragedy and pave ways for melodramas with happy endings.
Article
Objective: DNA methylation, a regulator of gene expression, plays an important role in diverse biological processes including developmental process, carcinogenesis and aging. In particular, aberrant DNA methylation has been largely observed in several types of cancers. Currently, it is important to extract disease-specific gene sets associated with the regulation of DNA methylation. Materials and methods: Here we propose a novel approach to find the minimum regulatory units of genes, co-methylated and co-expressed gene pairs (MEGP) that are highly correlated gene pairs between DNA methylation and gene expression showing the co-regulatory relationship. To evaluate whether our method is applicable to extract disease-associated genes, we applied our method to a large-scale dataset from the Cancer Genome Atlas extracting significantly associated MEGP and analyzed their functional correlation. Results: We observed that many MEGP physically interacted with each other and showed high semantic similarity with gene ontology terms. Furthermore, we performed gene set enrichment tests to identify how they are correlated in a complex biological process. Our MEGP were highly enriched in the biological pathway associated with ovarian cancers. Conclusions: Our approach is useful for discovering coordinated epigenetic markers associated with specific diseases.
Article
Motivation: Protein interaction networks provide an important system-level view of biological processes. One of the fundamental problems in biological network analysis is the global alignment of a pair of networks, which puts the proteins of one network into correspondence with the proteins of another network in a manner that conserves their interactions while respecting other evidence of their homology. By providing a mapping between the networks of different species, alignments can be used to inform hypotheses about the functions of unannotated proteins, the existence of unobserved interactions, the evolutionary divergence between the two species and the evolution of complexes and pathways. Results: We introduce GHOST, a global pairwise network aligner that uses a novel spectral signature to measure topological similarity between subnetworks. It combines a seed-and-extend global alignment phase with a local search procedure and exceeds state-of-the-art performance on several network alignment tasks. We show that the spectral signature used by GHOST is highly discriminative, whereas the alignments it produces are also robust to experimental noise. When compared with other recent approaches, we find that GHOST is able to recover larger and more biologically significant, shared subnetworks between species. Availability: An efficient and parallelized implementation of GHOST, released under the Apache 2.0 license, is available at http://cbcb.umd.edu/kingsford_group/ghost Contact: rob@cs.umd.edu.
Article
Access to unified datasets of protein and genetic interactions is critical for interrogation of gene/protein function and analysis of global network properties. BioGRID is a freely accessible database of physical and genetic interactions available at http://www.thebiogrid.org. BioGRID release version 2.0 includes >116 000 interactions from Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens. Over 30 000 interactions have recently been added from 5778 sources through exhaustive curation of the Saccharomyces cerevisiae primary literature. An internally hyper-linked web interface allows for rapid search and retrieval of interaction data. Full or user-defined datasets are freely downloadable as tab-delimited text files and PSI-MI XML. Pre-computed graphical layouts of interactions are available in a variety of file formats. User-customized graphs with embedded protein, gene and interaction attributes can be constructed with a visualization system called Osprey that is dynamically linked to the BioGRID.
autoencoder: Sparse Autoencoder for Automatic Learning of Representative Features from Unlabeled Data
  • E Dubossarsky
  • Y Tyshetskiy
Dubossarsky, E., Tyshetskiy, Y., 2015. autoencoder: Sparse Autoencoder for Automatic Learning of Representative Features from Unlabeled Data.. https:// CRAN.R-project.org/package=autoencoder.
Reactome knowledgebase of human biological pathways and processes
  • R Edgar
  • M Domrachev
  • Lash
Edgar, R., Domrachev, M., Lash, 2009. Reactome knowledgebase of human biological pathways and processes. Nucleic Acids Res. Vol. 37, 619-622.
Vitamin D signalling pathways in cancer: potential for anticancer therapeutics
  • K Kristin
  • D Trump
  • C Johnson
Kristin, K., Trump, D., Johnson, C., 2007. Vitamin D signalling pathways in cancer: potential for anticancer therapeutics. Nature 4, 684-700.
Gene Expression Omnibus: NCBI gene expression and hybridization array data repository
  • L Matthews
  • G Gopinath
  • M Gillespie
  • M Caudy
Matthews, L., Gopinath, G., Gillespie, M., Caudy, M., et al., 2002. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207-210.
  • D Nishimura
Nishimura, D., 2001. Biocarta, Biotech Software and Internet Report, vol. 2(3., pp. 117-120.
Cancer causes control
  • S Ogino
  • C Peter
  • N Reiko
  • P Amanda
  • B Andrew
  • S Mark
  • A Chan
  • T Melissa
Ogino, S., Peter, C., Reiko, N., Amanda, P., Andrew, B., Mark, S., Chan, A., Melissa, T., 2015. Cancer causes control. Proceedings of the Second International Molecular Pathological Epidemiology (MPE) Meeting vol. 26(7, 959-972.
EYA4 is a non-small cell lung cancer tumor suppressor located in the susceptibility locus on chromosome 6q
  • I M Wilson
  • E A Vucic
  • R Char
  • Y A Zhang
  • D T Starczynowski
  • T P Buys
Wilson, I.M., Vucic, E.A., Char, R., Zhang, Y.A., Starczynowski, D.T., Buys, T.P., et al., 2014. EYA4 is a non-small cell lung cancer tumor suppressor located in the susceptibility locus on chromosome 6q. Oncogene 33 (36), 4464-4473.