Article

Overcoming Complexity of Biological Systems: from Data Analysis to Mathematical Modeling

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

The problem of dealing with complexity arises when we fail to achieve a desired behavior of biological systems (for example, in cancer treatment). In this review I formulate the problem of tackling biological complexity at the level of large-dimensional datasets and complex mathematical models of reaction networks. I show that in many cases the complexity can be reduced by using approximation by simpler objects (for example, using principal graphs for data dimension reduction, and using dominant systems for reducing complex models). Examples of dealing with complexity from various fields of molecular systems biology are used, in particular, from the analysis of cancer transcriptomes, mathematical modeling of protein synthesis and of cell fate decisions between death and life.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Any biological object, including the taxa that are used to determine the chronostratigraphic units, is a part of very complex biological systems. The origination of each taxon in space and time depends on a great number of factors that are difficult if not impossible to know, even in modern biology (Wolf et al., 2018;Zinovyev, 2015;Mazzocchi, 2008;Novikoff, 1945) and far less possible in paleobiology (Fig. 2). This complexity is the reason why the taxonomy is not formalized in modern biology and alternative taxonomic systems and classifications are still widely accepted and protected by the International Code of Zoological Nomenclature (http://www.iczn.org/ ...
Article
Full-text available
For over 200 years the use of biotic events as the basis for the establishment of chronostratigraphic boundaries has been the only approach successfully utilized for international and national chronostratigraphy. The traditional biostratigraphic method provides relatively high resolution, averaging 1 Ma or sometimes less. This biochronological evolutionary approach to the Global Boundary Stratotype section and Point (GSSP) utilizes biotic Primary Markers (PM), with a few exceptions, encompasses the integrated PM and other non-PM markers as the general principles for defining GSSP boundaries and is a reasonably reliable mechanism for global correlation and a relatively stable International Geologic Time Scale (IGTS). The biotic PM's, however, possessed several serious restrictions: the nature of biological taxonomy, climatic, sedimentary, environmental - and directly applicable within the tropics-subtropics only. Biotic evolution and radiogenic isotopes are the only systems in geologic time that encompass the direction of time. The latter possessed less restrictions than the former. The recent tendency to define GSSP's utilizing magnetic chrons, climatic events and geochemistry may work in the Cenozoic, but is useless in the Mesozoic and older sediments because their cyclic nature (repeatedness) and the need for a second, directional in time index (biostratigraphic or radioisotopic) to place the PM in the right position within the scale. I propose here to utilize volcanic ash beds as the best Primary Marker in geologic chronostratigraphy. The U-Pb system is one of the most dependable of the geochronologic systems because it relies on a simple and non-interpretive radioisotopic decay constant. The ash-bed GSSP as a lithological horizon is universal for the GSSP definition and can be correlated as an age in any facies (marine, lagoon and continental), regardless of paleoclimatic zones, paleoceanographic, geochemical, and most other geological factors. Even moderate level metamorphism (>900 °С) does not affect the U-Pb dating of zircons. The GSSP at the base of a volcanic ash bed (Primary Marker) could be established in a short working time and these ash beds can be integrated with the existing as well as new biostratigraphic, geochemical, magnetostratigraphic and astronomical data (Secondary Markers) to create a robust, accurate and highly useable time scale. Several potential GSSP's that could be established with the volcanic ash beds close to the traditional and/or historical boundaries serve as examples for this approach and include Devonian-Carboniferous, Moscovian-Kasimovian, Kasimovian-Gzhelian, and Sakmarian-Artinskian boundaries.
... When considering data distributions in high-dimensional spaces, two diametrically opposed scenario need to be kept in mind 1,2 . In some cases, clouds of data points can be located in a vicinity of a relatively low-dimensional object (such as principal manifold), and hence possess a low intrinsic dimensionality. ...
Preprint
Full-text available
We present ElPiGraph, a method for approximating data distributions having non-trivial topological features such as the existence of excluded regions or branching structures. Unlike many existing methods, ElPiGraph is not based on the construction of a k-nearest neighbour graph, a procedure that can perform poorly in the case of multidimensional and noisy data. Instead, ElPiGraph constructs elastic principal graphs in a more robust way by minimizing elastic energy, applying graph grammars and explicitly controlling topological complexity. Using trimmed approximation error function makes ElPiGraph extremely robust to the presence of background noise without decreasing computational performance and allows it to deal with complex cases of manifold learning (for example, ElPiGraph can learn disconnected intersecting manifolds). Thanks to the quasi-quadratic nature of the elastic function, ElPiGraph performs almost as fast as a simple k-means clustering and, therefore, is much more scalable than alternative methods, and can work on large datasets containing millions of high dimensional points on a personal computer. The excellent performance of the method opens the possibility to apply resampling and to approximate complex data structures via principal graph ensembles which can be used to construct consensus principal graphs. ElPiGraph is currently implemented in five programming languages and accompanied by a graphical user interface, which makes it a versatile tool to deal with complex data in various fields from molecular biology, where it can be used to infer pseudo-time trajectories from single-cell RNASeq, to astronomy, where it can be used to approximate complex structures in the distribution of galaxies.
... The construction of large structural schemas for biochemical reaction networks such as global metabolic mechanism in human [5] or the global cancer signaling reaction network [6 ] has proved to be feasible but exploiting this knowledge remains a challenge. Using these reconstructions, it is possible to imagine detailed kinetic equations for a global reaction network inside a cell but it is more difficult, if not impossible, to find reaction rate constants and work with this large system even if it is considered 'realistic' [4,7 ]. Thus, the applicability of the pure bottom-up approach becomes questionable in this context. ...
Article
Mathematical modeling of biological networks is a promising approach to understand the complexity of cancer progression, which can be understood as accumulated abnormalities in the kinetics of cellular biochemistry. Two major modeling formalisms (languages) have been used for this purpose in the last couple of decades: one is based on the application of classical chemical kinetics of reaction networks and the other one is based on discrete kinetics representation (called logical formalism for simplicity here), governed by logical state update rules. In this short review, we remind the reader how these two methodologies complement each other but also present the fast and recent development of semi-quantitative approaches for modeling large biological networks, with a spectrum of complementary ideas each inheriting and combining features of both modeling formalisms. We also notice an increasing influence of the recent success of machine learning and artificial intelligence onto the methodology of mathematical network modeling in cancer research, leading to appearance of a number of pragmatic hybrid approaches. To illustrate the two approaches, logical versus kinetic modeling, we provide an example describing the same biological process with different description granularity in both discrete and continuous formalisms. The model focuses on a central question in cancer biology: understanding the mechanisms of metastasis appearance. We conclude that despite significant progress in development of modeling ideas, predicting response of large biological networks involved in cancer to various perturbations remains a major unsolved challenge in cancer systems biology.
... For example, the profile likelihood approach allows solving the model reduction problem together with the indetification problem and analysis of parameter identifiability and designate likely candidates for reduction. The following references demonstrate some other recent efforts in this direction [71,72,73,74,75,76,77]. ...
... Significant tech nological advancements made in the field of highthroughput -omics, that is, genomics, transcriptomics, proteomics, and metabolomics, directed current efforts toward designing software able to handle the analysis of the continuing flow of experimentally generated data [65]. DM approaches have been used to support traditional statistical techniques to address "big data" challenges, such as accounting for the large dimensionality and complexity of biological data [66]. Growing interest in DM techniques in research can be noticed through rapidly increasing number of scientific publications concerning these topics (Figure 14.7). ...
Chapter
Data mining is an interdisciplinary area of computer science combining database systems, statistical and machine learning approaches, and artificial intelligence focused on extraction of patterns and implicit relationships from data. In the era of high-throughput -omics technologies, the amount of scientific data that needs to be analyzed becomes problematic if not supported by powerful computers and sophisticated data mining algorithms, and thus, data mining techniques become increasingly popular among the scientific community. This chapter describes in detail the data mining process with special emphasis on its application in the field of -omics research.
... Comparing the number of measured items (p), the number of subjects (patients or cohort participants) are, though effort is made to collect as many subjects as possible (for example, a half million participants in UK biobank [94]), several order of magnitude smaller than the number of attributes (p >> n), which is called "small n big p problem" [95] in statistics which makes most of statistical methods (especially, multivariate analysis) invalid or useless. Only the new informatics such as "sparse data modelling" [96] or "deep learning" [97] can be utilized as effective analytical tools. ...
Article
This article is part of a For-Discussion-Section of Methods of Information in Medicine about the paper "The New Role of Biomedical Informatics in the Age of Digital Medicine" written by Fernando J. Martin-Sanchez and Guillermo H. Lopez-Campos [1]. It is introduced by an editorial. This article contains the combined commentaries invited to independently comment on the paper of Martin-Sanchez and Lopez-Campos. In subsequent issues the discussion can continue through letters to the editor.
... Although the examples cited here are of mechanical systems and their vibrations, the approach is very general and may be applicable to other types of complex systems. Of course there are many non-mechanical examples of complex systems in the physical [17], biological [18], and social sciences [19,20], and there is no obvious reason that the principle of separation will be universally applicable, though it is certainly possible. This type of analysis should thus be viewed only as a possible hypothesis for future work on other kinds of systems which have to be considered on a case by case basis. ...
Article
Full-text available
Complex systems are composed of a large number of individual components. Many of these systems are separable, i.e., they can be split into two coupled subsystems: one with foreground components and another with background components. The former leads to narrow peaks in the frequency spectrum of the system and the latter gives the broad-band part. There is coupling between the two subsystems, but they can be studied separately for purposes of modeling and for analysis of experimental data. Examples from the literature are given from the area of mechanical vibrations, but the approach is quite general and can be adapted to other kinds of problems.
... Most of the applications of ViDaExpert software and elastic map method found in bioinfomatics. It was used to visualize the universal 7-cluster structure of bacterial genomes [12], [13] and the structure of codon usage in genomes of various organisms [16], [28], [29]. Elastic maps allow approximation of molecular surfaces of complex molecules and visualizing them in ViDaExpert [17] (see Figure 5). ...
Conference Paper
Method of elastic maps allows fast learning of non-linear principal manifolds for large datasets. We present user-friendly implementation of the method in ViDaExpert software. Equipped with several dialogs for configuring data point representations (size, shape, color) and fast 3D viewer, ViDaExpert is a handy tool allowing to construct an interactive 3D-scene representing a table of data in multidimensional space and perform its quick and insightfull statistical analysis, from basic to advanced methods. We list several recent application examples of manifold learning by method of elastic maps in various fields of life sciences.
Article
The twentieth century was a time of great achievements in chemical kinetics: this period is characterized by the triumph of catalysis and the discovery of new reaction types such as chain reactions and oscillating reactions. Three major advancements that were crucial for chemical kinetics during the last 50 years are the development of new analytical techniques that enable monitoring the chemical composition of multicomponent reaction mixtures; The development of a battery of new physical methods for catalyst characterization; The increasing availability of powerful computational tools and techniques that enable the solving of complex kinetic models including hundreds of components and thousands of reactions. Certainly, if combined with operando catalyst characterization, temporal analysis of products will prove to be a very useful technique, which can be termed “chemical calculus” due to the insignificant change of the catalyst composition during a kinetic measurement.
Chapter
Full-text available
This chapter is devoted to the mathematical modeling of cellular decisions between death and life (referred to as cell fate decision). These decisions determine many cell events in multicellular and unicellular organisms. Understanding the principles of cell fate decisions is crucial for the studies of some tissue functioning (such as gut epithelium) and for the comprehension of tumour development, for which the tightly regulated mechanism of balancing between survival and death is violated towards survival. In a broader context, cell fate decisions are examples of cellular decision-making mechanisms which are abundant in all living organisms from viruses and bacteria to mammals (Balazsi et al. 2011).
Book
Full-text available
In 1901, Karl Pearson invented Principal Component Analysis (PCA). Since then, PCA serves as a prototype for many other tools of data analysis, visualization and dimension reduction: Independent Component Analysis (ICA), Multidimensional Scaling (MDS), Nonlinear PCA (NLPCA), Self Organizing Maps (SOM), etc. The book starts with the quote of the classical Pearson definition of PCA and includes reviews of various methods: NLPCA, ICA, MDS, embedding and clustering algorithms, principal manifolds and SOM. New approaches to NLPCA, principal manifolds, branching principal components and topology preserving mappings are described as well. Presentation of algorithms is supplemented by case studies, from engineering to astronomy, but mostly of biological data: analysis of microarray and metabolite data. The volume ends with a tutorial "PCA and K-means decipher genome". The book is meant to be useful for practitioners in applied data analysis in life sciences, engineering, physics and chemistry; it will also be valuable to PhD students and researchers in computer sciences, applied mathematics and statistics.
Article
Full-text available
Summary Extracting relevant information from large-scale data offers unprecedented opportunities in cancerology. We applied independent component analysis (ICA) to bladder cancer transcriptome data sets and interpreted the components using gene enrichment analysis and tumor-associated molecular, clinicopathological, and processing information. We identified components associated with biological processes of tumor cells or the tumor microenvironment, and other components revealed technical biases. Applying ICA to nine cancer types identified cancer-shared and bladder-cancer-specific components. We characterized the luminal and basal-like subtypes of muscle-invasive bladder cancers according to the components identified. The study of the urothelial differentiation component, specific to the luminal subtypes, showed that a molecular urothelial differentiation program was maintained even in those luminal tumors that had lost morphological differentiation. Study of the genomic alterations associated with this component coupled with functional studies revealed a protumorigenic role for PPARG in luminal tumors. Our results support the inclusion of ICA in the exploitation of multiscale data sets.
Book
Full-text available
In 1901, Karl Pearson invented Principal Component Analysis (PCA). Since then, PCA serves as a prototype for many other tools of data analysis, visualization and dimension reduction: Independent Component Analysis (ICA), Multidimensional Scaling (MDS), Nonlinear PCA (NLPCA), Self Organizing Maps (SOM), etc. The book starts with the quote of the classical Pearson definition of PCA and includes reviews of various methods: NLPCA, ICA, MDS, embedding and clustering algorithms, principal manifolds and SOM. New approaches to NLPCA, principal manifolds, branching principal components and topology preserving mappings are described as well. Presentation of algorithms is supplemented by case studies, from engineering to astronomy, but mostly of biological data: analysis of microarray and metabolite data. The volume ends with a tutorial "PCA and K-meansdecipher genome". The book is meant to be useful for practitioners in applied data analysis in life sciences, engineering, physics and chemistry; it will also be valuable to PhD students and researchers in computer sciences, applied mathematics and statistics.
Conference Paper
Full-text available
We introduce a methodology allowing to reduce and to compare systems biology models. This is based on several reduction tools. The flrst tool is a combination of Clarke's graphical technique and idempotent algebra. The second tool is the Karhunen-Loµeve expansion, providing a linear embedding for the invariant manifold. The nonlinear dimension of the invariant manifold is estimated by a third method. We also introduce a novel, more realistic model for NFB signaling. This model is reduced and compared to existing models.
Article
Full-text available
The dynamical analysis of large biological regulatory networks requires the development of scalable methods for mathematical modeling. Following the approach initially introduced by Thomas, we formalize the interactions between the components of a network in terms of discrete variables, functions, and parameters. Model simulations result in directed graphs, called state transition graphs. We are particularly interested in reachability properties and asymptotic behaviors, which correspond to terminal strongly connected components (or "attractors") in the state transition graph. A well-known problem is the exponential increase of the size of state transition graphs with the number of network components, in particular when using the biologically realistic asynchronous updating assumption. To address this problem, we have developed several complementary methods enabling the analysis of the behavior of large and complex logical models: (i) the definition of transition priority classes to simplify the dynamics; (ii) a model reduction method preserving essential dynamical properties, (iii) a novel algorithm to compact state transition graphs and directly generate compressed representations, emphasizing relevant transient and asymptotic dynamical properties. The power of an approach combining these different methods is demonstrated by applying them to a recent multilevel logical model for the network controlling CD4+ T helper cell response to antigen presentation and to a dozen cytokines. This model accounts for the differentiation of canonical Th1 and Th2 lymphocytes, as well as of inflammatory Th17 and regulatory T cells, along with many hybrid subtypes. All these methods have been implemented into the software GINsim, which enables the definition, the analysis, and the simulation of logical regulatory graphs.
Article
Full-text available
Trichomes are leaf hairs that are formed by single cells on the leaf surface. They are known to be involved in pathogen resistance. Their patterning is considered to emerge from a field of initially equivalent cells through the action of a gene regulatory network involving trichome fate promoting and inhibiting factors. For a quantitative analysis of single and double mutants or the phenotypic variation of patterns in different ecotypes, it is imperative to statistically evaluate the pattern reliably on a large number of leaves. Here we present a method that enables the analysis of trichome patterns at early developmental leaf stages and the automatic analysis of various spatial parameters. We focus on the most challenging young leaf stages that require the analysis in three dimensions, as the leaves are typically not flat. Our software TrichEratops reconstructs 3D surface models from 2D stacks of conventional light-microscope pictures. It allows the GUI-based annotation of different stages of trichome development, which can be analyzed with respect to their spatial distribution to capture trichome patterning events. We show that 3D modeling removes biases of simpler 2D models and that novel trichome patterning features increase the sensitivity for inter-accession comparisons.
Article
Full-text available
Systematic analysis of synthetic lethality (SL) constitutes a critical tool for systems biology to decipher molecular pathways. The most accepted mechanistic explanation of SL is that the two genes function in parallel, mutually compensatory pathways, known as between-pathway SL. However, recent genome-wide analyses in yeast identified a significant number of within-pathway negative genetic interactions. The molecular mechanisms leading to within-pathway SL are not fully understood. Here, we propose a novel mechanism leading to within-pathway SL involving two genes functioning in a single non-essential pathway. This type of SL termed within-reversible-pathway SL involves reversible pathway steps, catalyzed by different enzymes in the forward and backward directions, and kinetic trapping of a potentially toxic intermediate. Experimental data with recombinational DNA repair genes validate the concept. Mathematical modeling recapitulates the possibility of kinetic trapping and revealed the potential contributions of synthetic, dosage-lethal interactions in such a genetic system as well as the possibility of within-pathway positive masking interactions. Analysis of yeast gene interaction and pathway data suggests broad applicability of this novel concept. These observations extend the canonical interpretation of synthetic-lethal or synthetic-sick interactions with direct implications to reconstruct molecular pathways and improve therapeutic approaches to diseases such as cancer.
Article
Full-text available
The century of complexity has come. The face of science has changed. Surprisingly, when we start asking about the essence of these changes and then critically analyse the answers, the result are mostly discouraging. Most of the answers are related to the properties that have been in the focus of scientific research already for more than a century (like non-linearity). This paper is Preface to the special issue "Grasping Complexity" of the journal "Computers and Mathematics with Applications". We analyse the change of era in science, its reasons and main changes in scientific activity and give a brief review of the papers in the issue.
Article
Full-text available
We introduce Pathifier, an algorithm that infers pathway deregulation scores for each tumor sample on the basis of expression data. This score is determined, in a context-specific manner, for every particular dataset and type of cancer that is being investigated. The algorithm transforms gene-level information into pathway-level information, generating a compact and biologically relevant representation of each sample. We demonstrate the algorithm's performance on three colorectal cancer datasets and two glioblastoma multiforme datasets and show that our multipathway-based representation is reproducible, preserves much of the original information, and allows inference of complex biologically significant information. We discovered several pathways that were significantly associated with survival of glioblastoma patients and two whose scores are predictive of survival in colorectal cancer: CXCR3-mediated signaling and oxidative phosphorylation. We also identified a subclass of proneural and neural glioblastoma with significantly better survival, and an EGF receptor-deregulated subclass of colon cancers.
Article
Full-text available
Background Public repositories of biological pathways and networks have greatly expanded in recent years. Such databases contain many pathways that facilitate the analysis of high-throughput experimental work and the formulation of new biological hypotheses to be tested, a fundamental principle of the systems biology approach. However, large-scale molecular maps are not always easy to mine and interpret. Results We have developed BiNoM (Biological Network Manager), a Cytoscape plugin, which provides functions for the import-export of some standard systems biology file formats (import from CellDesigner, BioPAX Level 3 and CSML; export to SBML, CellDesigner and BioPAX Level 3), and a set of algorithms to analyze and reduce the complexity of biological networks. BiNoM can be used to import and analyze files created with the CellDesigner software. BiNoM provides a set of functions allowing to import BioPAX files, but also to search and edit their content. As such, BiNoM is able to efficiently manage large BioPAX files such as whole pathway databases (e.g. Reactome). BiNoM also implements a collection of powerful graph-based functions and algorithms such as path analysis, decomposition by involvement of an entity or cyclic decomposition, subnetworks clustering and decomposition of a large network in modules. Conclusions Here, we provide an in-depth overview of the BiNoM functions, and we also detail novel aspects such as the support of the BioPAX Level 3 format and the implementation of a new algorithm for the quantification of pathways for influence networks. At last, we illustrate some of the BiNoM functions on a detailed biological case study of a network representing the G1/S transition of the cell cycle, a crucial cellular process disturbed in most human tumors.
Data
Full-text available
Tumor development is characterized by a compromised balance between cell life and death decision mechanisms, which are tighly regulated in normal cells. Understanding this process provides insights for developing new treatments for fighting with cancer. We present a study of a mathematical model describing cellular choice between survival and two alternative cell death modalities: apoptosis and necrosis. The model is implemented in discrete modeling formalism and allows to predict probabilities of having a particular cellular phenotype in response to engagement of cell death receptors. Using an original parameter sensitivity analysis developed for discrete dynamic systems, we determine the critical parameters affecting cellular fate decision variables that appear to be critical in the cellular fate decision and discuss how they are exploited by existing cancer therapies.
Data
Full-text available
In many physical, statistical, biological and other investigations it is desirable to approximate a system of points by objects of lower dimension and/or complexity. For this purpose, Karl Pearson invented principal component analysis in 1901 and found ‘lines and planes of closest fit to system of points’. The famous k-means algorithm solves the approximation problem too, but by finite sets instead of lines and planes. This chapter gives a brief practical introduction into the methods of construction of general principal objects (i.e., objects embedded in the ‘middle’ of the multidimensional data set). As a basis, the unifying framework of mean squared distance approximation of finite datasets is selected. Principal graphs and manifolds are constructed as generalisations of principal components and k-means principal points. For this purpose, the family of expectation/maximisation algorithms with nearest generalisations is presented. Construction of principal graphs with controlled complexity is based on the graph grammar approach.
Article
Full-text available
Biochemical networks are used in computational biology, to model mechanistic details of systems involved in cell signaling, metabolism, and regulation of gene expression. Parametric and structural uncertainty, as well as combinatorial explosion are strong obstacles against analyzing the dynamics of large models of this type. Multiscaleness, an important property of these networks, can be used to get past some of these obstacles. Networks with many well separated time scales, can be reduced to simpler models, in a way that depends only on the orders of magnitude and not on the exact values of the kinetic parameters. The main idea used for such robust simplifications of networks is the concept of dominance among model elements, allowing hierarchical organization of these elements according to their effects on the network dynamics. This concept finds a natural formulation in tropical geometry. We revisit, in the light of these new ideas, the main approaches to model reduction of reaction networks, such as quasi-steady state (QSS) and quasi-equilibrium approximations (QE), and provide practical recipes for model reduction of linear and non-linear networks. We also discuss the application of model reduction to the problem of parameter identification, via backward pruning machine learning techniques.
Chapter
Full-text available
Principal manifolds are defined as lines or surfaces passing through “the middle” of data distribution. Linear principal manifolds (Principal Components Analysis) are routinely used for dimension reduction, noise filtering and data visualization. Recently, methods for constructing non-linear principal manifolds were proposed, including our elastic maps approach which is based on a physical analogy with elastic membranes. We have developed a general geometric framework for constructing “principal objects” of various dimensions and topologies with the simplest quadratic form of the smoothness penalty which allows very effective parallel implementations. Our approach is implemented in three programming languages (C++, Java and Delphi) with two graphical user interfaces (VidaExpert and ViMiDa applications). In this paper we overview the method of elastic maps and present in detail one of its major applications: the visualization of microarray data in bioinformatics. We show that the method of elastic maps outperforms linear PCA in terms of data approximation, representation of between-point distance structure, preservation of local point neighborhood and representing point classes in low-dimensional spaces.
Chapter
Full-text available
Multidimensional data distributions can have complex topologies and variable local dimensions. To approximate complex data, we propose a new type of low-dimensional “principal object”: a principal cubic complex. This complex is a generalization of linear and non-linear principal manifolds and includes them as a particular case. To construct such an object, we combine a method of topological grammars with the minimization of an elastic energy defined for its embedment into multidimensional data space. The whole complex is presented as a system of nodes and springs and as a product of one-dimensional continua (represented by graphs), and the grammars describe how these continua transform during the process of optimal complex construction
Conference Paper
Full-text available
DNA analysis by microarrays is a powerful tool that allows replication of the RNA of hundreds of thousands of genes at the same time, generating a large amount of data in multidimensional space that must be analyzed using informatics tools. Various clustering techniques have been applied to analyze the microarrays, but they do not offer a systematic form of analysis. This paper proposes the use of Gorban's Elastic Neural Net in an iterative way to find patterns of expressed genes. The new method proposed (Iterative Elastic Neural Net, IENN) has been evaluated with up-regulated genes of the Escherichia Coli bacterium and is compared with the Self- Organizing Maps (SOM) technique frequently used in this kind of analysis. The results show that the proposed method finds 86.7% of the up-regulated genes, compared to 65.2% of genes found by the SOM. A comparative analysis of Receiver Operating Characteristic (ROC) with SOM shows that the proposed method is 11.5% more effective.
Article
Data visualization is an essential element of biological research, required for obtaining insights and formulating new hypotheses on mechanisms of health and disease. NaviCell Web Service is a tool for network-based visualization of 'omics' data which implements several data visual representation methods and utilities for combining them together. NaviCell Web Service uses Google Maps and semantic zooming to browse large biological network maps, represented in various formats, together with different types of the molecular data mapped on top of them. For achieving this, the tool provides standard heatmaps, barplots and glyphs as well as the novel map staining technique for grasping large-scale trends in numerical values (such as whole transcriptome) projected onto a pathway map. The web service provides a server mode, which allows automating visualization tasks and retrieving data from maps via RESTful (standard HTTP) calls. Bindings to different programming languages are provided (Python and R). We illustrate the purpose of the tool with several case studies using pathway maps created by different research groups, in which data visualization provides new insights into molecular mechanisms involved in systemic diseases such as cancer and neurodegenerative diseases. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Book
Cancer is a complex and heterogeneous disease that exhibits high levels of robustness against various therapeutic interventions. It is a constellation of diverse and evolving disorders that are manifested by the uncontrolled proliferation of cells that may eventually lead to fatal dysfunction of the host system. Although some of the cancer subtypes can be cured by early diagnosis and specific treatment, no effective treatment is yet established for a significant portion of cancer subtypes. In industrial countries where the average life expectancy is high, cancer is one of the major causes of death. Any contribution to an in-depth understanding of cancer shall eventually lead to better care and treatment for patients. Due to the complex, heterogeneous, and evolving nature of cancer, it is essential for a system-oriented view to be adopted for an in-depth understanding. The question is how to achieve an in-depth yet realistic understanding of cancer dynamics. Although large-scale experiments are now being deployed, there are practical limitations of how much they do to convey the reality of cancer pathology and progression within the patient’s body. Computational approaches with system-oriented thinking may complement the limitations of an experimental approach. Computational studies not only provide us with new insights from large-scale experimental data, but also enable us to perceive what are the conceivable characteristics of cancer under certain assumptions. It is an engine of thoughts and proving grounds of various hypotheses on how cancer may behave as well as how molecular mechanisms work within anomalous conditions. It is not just computing that helps us fight against cancer, but a computational approach has to be combined with a proper theoretical framework that enables us to perceive “cancer” as complex dynamical and evolvable systems that entail a robust yet fragile nature. This recognition shifts our attention from the magic bullet approach of anti-cancer drugs to a more systematic control of cancer as complex dynamical phenomena. This leads to the view that a complex system has to be controlled by complex interventions. To understand such a system and design complex interventions, it is essential that we combine experimental and computational approaches. Thus, computational systems biology of cancer is an essential discipline for cancer biology and is expected to have major impacts for clinical decision-making. This is the first book specifically focused on computational systems biology of cancer with a coherent and proper vision on how to tackle this formidable challenge. Book web-site:http://www.cancer-systems-biology.net/
Article
Several decades of molecular biology research have delivered a wealth of detailed descriptions of molecular interactions in normal and tumour cells. This knowledge has been functionally organised and assembled into dedicated biological pathway resources that serve as an invaluable tool, not only for structuring the information about molecular interactions but also for making it available for biological, clinical and computational studies. With the advent of high-throughput molecular profiling of tumours, close to complete molecular catalogues of mutations, gene expression and epigenetic modifications are available and require adequate interpretation. Taking into account the information about biological signalling machinery in cells may help to better interpret molecular profiles of tumours. Making sense out of these descriptions requires biological pathway resources for functional interpretation of the data. In this review, we describe the available biological pathway resources, their characteristics in terms of construction mode, focus, aims and paradigms of biological knowledge representation. We present a new resource that is focused on cancer-related signalling, the Atlas of Cancer Signalling Networks. We briefly discuss current approaches for data integration, visualisation and analysis, using biological networks, such as pathway scoring, guilt-by-association and network propagation. Finally, we illustrate with several examples the added value of data interpretation in the context of biological networks and demonstrate that it may help in analysis of high-throughput data like mutation, gene expression or small interfering RNA screening and can guide in patients stratification. Finally, we discuss perspectives for improving precision medicine using biological network resources and tools. Taking into account the information about biological signalling machinery in cells may help to better interpret molecular patterns of tumours and enable to put precision oncology into general clinical practice. © The Author 2015. Published by Oxford University Press on behalf of the UK Environmental Mutagen Society. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Article
Epithelial-to-mesenchymal transition-like (EMT-like) is a critical process allowing initiation of metastases during tumour progression. Here, to investigate its role in intestinal cancer, we combine computational network-based and experimental approaches to create a mouse model with high metastatic potential. Construction and analysis of this network map depicting molecular mechanisms of EMT regulation based on the literature suggests that Notch activation and p53 deletion have a synergistic effect in activating EMT-like processes. To confirm this prediction, we generate transgenic mice by conditionally activating the Notch1 receptor and deleting p53 in the digestive epithelium (NICD/p53(-/-)). These mice develop metastatic tumours with high penetrance. Using GFP lineage tracing, we identify single malignant cells with mesenchymal features in primary and metastatic tumours in vivo. The development of such a model that recapitulates the cellular features observed in invasive human colorectal tumours is appealing for innovative drug discovery.
Article
Tissue regeneration is an orchestrated progression of cells from an immature state to a mature one, conventionally represented as distinctive cell subsets. A continuum of transitional cell states exists between these discrete stages. We combine the depth of single-cell mass cytometry and an algorithm developed to leverage this continuum by aligning single cells of a given lineage onto a unified trajectory that accurately predicts the developmental path de novo. Applied to human B cell lymphopoiesis, the algorithm (termed Wanderlust) constructed trajectories spanning from hematopoietic stem cells through to naive B cells. This trajectory revealed nascent fractions of B cell progenitors and aligned them with developmentally cued regulatory signaling including IL-7/STAT5 and cellular events such as immunoglobulin rearrangement, highlighting checkpoints across which regulatory signals are rewired paralleling changes in cellular state. This study provides a comprehensive analysis of human B lymphopoiesis, laying a foundation to apply this approach to other tissues and "corrupted" developmental processes including cancer.
Article
Four chapters of the synthesis represent four major areas of my research interests: 1) data analysis in molecular biology, 2) mathematical modeling of biological networks, 3) genome evolution, and 4) cancer systems biology. The first chapter is devoted to my work in developing non-linear methods of dimension reduction (methods of elastic maps and principal trees) which extends the classical method of principal components. Also I present application of matrix factorization techniques to analysis of cancer data. The second chapter is devoted to the complexity of mathematical models in molecular biology. I describe the basic ideas of asymptotology of chemical reaction networks aiming at dissecting and simplifying complex chemical kinetics models. Two applications of this approach are presented: to modeling NFkB and apoptosis pathways, and to modeling mechanisms of miRNA action on protein translation. The third chapter briefly describes my investigations of the genome structure in different organisms (from microbes to human cancer genomes). Unsupervised data analysis approaches are used to investigate the patterns in genomic sequences shaped by genome evolution and influenced by the basic properties of the environment. The fourth chapter summarizes my experience in studying cancer by computational methods (through combining integrative data analysis and mathematical modeling approaches). In particular, I describe the on-going research projects such as mathematical modeling of cell fate decisions and synthetic lethal interactions in DNA repair network. The synthesis is concluded by listing major challenges in computational systems biology, connected to the topics of this text, i.e. dealing with complexity of biological systems.
Article
Molecular biology knowledge can be formalized and systematically represented in a computer-readable form as a comprehensive map of molecular interactions. There exist an increasing number of maps of molecular interactions containing detailed and step-wise description of various cell mechanisms. It is difficult to explore these large maps, to organize discussion of their content and to maintain them. Several efforts were recently made to combine these capabilities together in one environment, and NaviCell represents one of them. NaviCell is a web-based environment for exploiting large maps of molecular interactions, created in CellDesigner, allowing their easy exploration, curation and maintenance. It is characterized by a combination of three essential features: (1) efficient map browsing based on Google Maps engine; (2) semantic zooming for viewing different levels of details or of abstraction of the map and (3) integrated web-based blog for collecting the community feedback. NaviCell can be easily used by experts in the field of molecular biology for studying molecular entities of their interest in the context of signaling pathways and crosstalk between pathways within a global signaling network. NaviCell allows both exploration of detailed molecular mechanisms represented on the map and a more abstract view of the map up to a top-level modular representation. NaviCell greatly facilitates curation, maintenance and updating the comprehensive maps of molecular interactions in an interactive and user-friendly fashion due to an imbedded blogging system. NaviCell provides a user-friendly exploration of large-scale maps of molecular interactions, thanks to Google Maps and WordPress interfaces, with which many users are already familiar. Semantic zooming which is used for navigating geographical maps is adopted for molecular maps in NaviCell, making any level of visualization readable. In addition, NaviCell provides the framework for community-based curation of maps.
Article
We review several mathematical methods allowing to identify modules and hierarchies with several levels of complexity in biological systems. These methods are based either on the properties of the input-output characteristic of the modules or on global properties of the dynamics such as the distribution of timescales or the stratiflcation of attractors with variable dimension. We also discuss the consequences of the hierarchical structure on the robustness of biological processes. Stratifled attractors lead to Waddington's type canalization efiects. Successive application of the many to one mapping relating parameters of difierent levels in an hierarchy of models (analogue to the renormalization operation from statistical mechanics) leads to concentration and robustness of those properties that are common to many levels of complexity. Examples such as the response of the transcription factor NFB to signalling, and the segmentation patterns in the development of Drosophila are used as illustrations of the theoretical ideas.
Article
All living things are remarkably complex, yet their DNA is unstable, undergoing countless random mutations over generations. Despite this instability, most animals do not grow two heads or die, plants continue to thrive, and bacteria continue to divide. Robustness and Evolvability in Living Systems tackles this perplexing paradox. The book explores why genetic changes do not cause organisms to fail catastrophically and how evolution shapes organisms' robustness. Andreas Wagner looks at this problem from the ground up, starting with the alphabet of DNA, the genetic code, RNA, and protein molecules, moving on to genetic networks and embryonic development, and working his way up to whole organisms. He then develops an evolutionary explanation for robustness. Wagner shows how evolution by natural selection preferentially finds and favors robust solutions to the problems organisms face in surviving and reproducing. Such robustness, he argues, also enhances the potential for future evolutionary innovation. Wagner also argues that robustness has less to do with organisms having plenty of spare parts (the redundancy theory that has been popular) and more to do with the reality that mutations can change organisms in ways that do not substantively affect their fitness. Unparalleled in its field, this book offers the most detailed analysis available of all facets of robustness within organisms. It will appeal not only to biologists but also to engineers interested in the design of robust systems and to social scientists concerned with robustness in human communities and populations.
Article
The Biological Network Manager (BiNoM) is a software tool for the manipulation and analysis of biological networks. It facilitates the import and conversion of a set of well-established systems biology file formats. It also provides a large set of graph-based algorithms that allow users to analyze and extract relevant subnetworks from large molecular maps. It has been successfully used in several projects related to the analysis of large and complex biological data, or networks from databases. In this tutorial, we present a detailed and practical case study of how to use BiNoM to analyze biological networks.
Article
Over the past decade, comprehensive sequencing efforts have revealed the genomic landscapes of common forms of human cancer. For most cancer types, this landscape consists of a small number of “mountains” (genes altered in a high percentage of tumors) and a much larger number of “hills” (genes altered infrequently). To date, these studies have revealed ~140 genes that, when altered by intragenic mutations, can promote or “drive” tumorigenesis. A typical tumor contains two to eight of these “driver gene” mutations; the remaining mutations are passengers that confer no selective growth advantage. Driver genes can be classified into 12 signaling pathways that regulate three core cellular processes: cell fate, cell survival, and genome maintenance. A better understanding of these pathways is one of the most pressing needs in basic cancer research. Even now, however, our knowledge of cancer genomes is sufficient to guide the development of more effective approaches for reducing cancer morbidity and mortality.
Article
MicroRNAs can affect the protein translation using nine mechanistically different mechanisms, including repression of initiation and degradation of the transcript. There is a hot debate in the current literature about which mechanism and in which situations has a dominant role in living cells. The worst, same experimental systems dealing with the same pairs of mRNA and miRNA can provide ambiguous evidences about which is the actual mechanism of translation repression observed in the experiment. We start with reviewing the current knowledge of various mechanisms of miRNA action and suggest that mathematical modeling can help resolving some of the controversial interpretations. We describe three simple mathematical models of miRNA translation that can be used as tools in interpreting the experimental data on the dynamics of protein synthesis. The most complex model developed by us includes all known mechanisms of miRNA action. It allowed us to study possible dynamical patterns corresponding to different miRNA-mediated mechanisms of translation repression and to suggest concrete recipes on determining the dominant mechanism of miRNA action in the form of kinetic signatures. Using computational experiments and systematizing existing evidences from the literature, we justify a hypothesis about co-existence of distinct miRNA-mediated mechanisms of translation repression. The actually observed mechanism will be that acting on or changing the sensitive parameters of the translation process. The limiting place can vary from one experimental setting to another. This model explains the majority of existing controversies reported.
Article
How to measure the complexity of a finite set of vectors embedded in a multidimensional space? This is a non-trivial question which can be approached in many different ways. Here we suggest a set of data complexity measures using universal approximators, principal cubic complexes. Principal cubic complexes generalise the notion of principal manifolds for datasets with non-trivial topologies. The type of the principal cubic complex is determined by its dimension and a grammar of elementary graph transformations. The simplest grammar produces principal trees. We introduce three natural types of data complexity: 1) geometric (deviation of the data's approximator from some "idealized" configuration, such as deviation from harmonicity); 2) structural (how many elements of a principal graph are needed to approximate the data), and 3) construction complexity (how many applications of elementary graph transformations are needed to construct the principal object starting from the simplest one). We compute these measures for several simulated and real-life data distributions and show them in the "accuracy-complexity" plots, helping to optimize the accuracy/complexity ratio. We discuss various issues connected with measuring data complexity. Software for computing data complexity measures from principal cubic complexes is provided as well.
Article
MicroRNAs (miRNAs) are key regulators of all important biological processes, including development, differentiation, and cancer. Although remarkable progress has been made in deciphering the mechanisms used by miRNAs to regulate translation, many contradictory findings have been published that stimulate active debate in this field. Here we contribute to this discussion in three ways. First, based on a comprehensive analysis of the existing literature, we hypothesize a model in which all proposed mechanisms of microRNA action coexist, and where the apparent mechanism that is detected in a given experiment is determined by the relative values of the intrinsic characteristics of the target mRNAs and associated biological processes. Among several coexisting miRNA mechanisms, the one that will effectively be measurable is that which acts on or changes the sensitive parameters of the translation process. Second, we have created a mathematical model that combines nine known mechanisms of miRNA action and estimated the model parameters from the literature. Third, based on the mathematical modeling, we have developed a computational tool for discriminating among different possible individual mechanisms of miRNA action based on translation kinetics data that can be experimentally measured (kinetic signatures). To confirm the discriminatory power of these kinetic signatures and to test our hypothesis, we have performed several computational experiments with the model in which we simulated the coexistence of several miRNA action mechanisms in the context of variable parameter values of the translation.
Article
Synthesis of proteins is one of the most fundamental biological processes, which consumes a significant amount of cellular resources. Despite many efforts to produce detailed mechanistic mathematical models of translation, no basic and simple kinetic model of mRNA lifecycle (transcription, translation and degradation) exists. We build such a model by lumping multiple states of translated mRNA into few dynamical variables and introducing a pool of translating ribosomes. The basic and simple model can be extended, if necessary, to take into account various phenomena such as the interaction between translating ribosomes or regulation of translation by microRNA. The model can be used as a building block (translation module) for more complex models of cellular processes.
Article
In this paper, we construct low-dimensional manifolds of reduced description for equations of chemical kinetics from the standpoint of the method of invariant manifold (MIM). MIM is based on a formulation of the condition of invariance as an equation, and its solution by Newton iterations. A grid-based version of MIM is developed (the method of invariant grids). We describe the Newton method and the relaxation method for the invariant grids construction. The problem of the grid correction is fully decomposed into the problems of the grid's nodes correction. The edges between the nodes appear only in the calculation of the tangent spaces. This fact determines high computational efficiency of the method of invariant grids. The method is illustrated by two examples: the simplest catalytic reaction (Michaelis–Menten mechanism), and the hydrogen oxidation. The algorithm of analytical continuation of the approximate invariant manifold from the discrete grid is proposed. Generalizations to open systems are suggested. The set of methods covered makes it possible to effectively reduce description in chemical kinetics.
Article
The concept of the limiting step is extended to the asymptotology of multiscale reaction networks. Complete theory for linear networks with well separated reaction rate constants is developed. We present algorithms for explicit approximations of eigenvalues and eigenvectors of kinetic matrix. Accuracy of estimates is proven. Performance of the algorithms is demonstrated on simple examples. Application of algorithms to nonlinear systems is discussed.
Article
A method of topological grammars is proposed for multidimensional data approximation. For data with complex topology we define a principal cubic complex of low dimension and given complexity that gives the best approximation for the dataset. This complex is a generalization of linear and non-linear principal manifolds and includes them as particular cases. The problem of optimal principal complex construction is transformed into a series of minimization problems for quadratic functionals. These quadratic functionals have a physically transparent interpretation in terms of elastic energy. For the energy computation, the whole complex is represented as a system of nodes and springs. Topologically, the principal complex is a product of one-dimensional continuums (represented by graphs), and the grammars describe how these continuums transform during the process of optimal complex construction. This factorization of the whole process onto one-dimensional transformations using minimization of quadratic energy functionals allows us to construct efficient algorithms.
Article
Principal manifolds serve as useful tool for many practical applications. These manifolds are defined as lines or surfaces passing through "the middle" of data distribution. We propose an algorithm for fast construction of grid approximations of principal manifolds with given topology. It is based on analogy of principal manifold and elastic membrane. The first advantage of this method is a form of the functional to be minimized which becomes quadratic at the step of the vertices position refinement. This makes the algorithm very effective, especially for parallel implementations. Another advantage is that the same algorithmic kernel is applied to construct principal manifolds of different dimensions and topologies. We demonstrate how flexibility of the approach allows numerous adaptive strategies like principal graph constructing, etc. The algorithm is implemented as a C++ package elmap and as a part of stand-alone data visualization tool VidaExpert, available on the web. We describe the approach and provide several examples of its application with speed performance characteristics.
Article
Tumor development is characterized by a compromised balance between cell life and death decision mechanisms, which are tightly regulated in normal cells. Understanding this process provides insights for developing new treatments for fighting with cancer. We present a study of a mathematical model describing cellular choice between survival and two alternative cell death modalities: apoptosis and necrosis. The model is implemented in discrete modeling formalism and allows to predict probabilities of having a particular cellular phenotype in response to engagement of cell death receptors. Using an original parameter sensitivity analysis developed for discrete dynamic systems, we determine variables that appear to be critical in the cellular fate decision and discuss how they are exploited by existing cancer therapies.