Figure - available from: Frontiers in Genetics
This content is subject to copyright.
A framework of the proposed method.

A framework of the proposed method.

Source publication
Article
Full-text available
Several researchers have focused on random-forest-based inference methods because of their excellent performance. Some of these inference methods also have a useful ability to analyze both time-series and static gene expression data. However, they are only of use in ranking all of the candidate regulations by assigning them confidence values. None...

Similar publications

Article
Full-text available
Among the various methods so far proposed for genetic network inference, this study focuses on the random-forest-based methods. Confidence values are assigned to all of the candidate regulations when taking the random-forest-based approach. To our knowledge, all of the random-forest-based methods make the assignments using the standard variable imp...
Article
Full-text available
Purpose Cabozantinib (CAB) as monotherapy or in combination with immune checkpoint inhibitors is used for systemic treatment of metastatic renal cell carcinoma (mRCC). However, little is known about predictors of treatment response to CAB. For this reason, known genomic drivers were examined to identify potential predictors of treatment response wi...
Article
Full-text available
Inference of gene regulatory networks has been an active area of research for around 20 years, leading to the development of sophisticated inference algorithms based on a variety of assumptions and approaches. With the ever increasing demand for more accurate and powerful models, the inference problem remains of broad scientific interest. The abstr...
Preprint
Full-text available
Single cell RNA-sequencing (scRNA-seq) has very rapidly become the new workhorse of modern biology providing an unprecedented global view on cellular diversity and heterogeneity. In particular, the structure of gene-gene expression correlation contains information on the underlying gene regulatory networks. However, interpretation of scRNA-seq data...
Article
Full-text available
In systems biology, the accurate reconstruction of Gene Regulatory Networks (GRNs) is crucial since these networks can facilitate the solving of complex biological problems. Amongst the plethora of methods available for GRN reconstruction, information theory and fuzzy concepts-based methods have abiding popularity. However, most of these methods ar...

Citations

... 21 A general strategy for handling a preponderance of false positives is to statistically filter TRN outputs to achieve a user-specified precision, typically expressed in terms of false discovery rate (FDR) control. [22][23][24][25][26][27] In brief, the FDR is the proportion of significant predictions that are expected to be false positives, and there are various ways to estimate the FDR of TRN inferences (see Box 1). If FDR estimates are accurate, then the user can generate a TRN in which only a specified fraction of edges are false positives. ...
... Indeed, prior work on FDR-controlled TRNs has not systematically compared reported FDR on real datasets against gold standards. [22][23][24][25][26][27]43 We develop gold standards and approaches for a fair empirical evaluation of FDR control across a variety of methods and assumptions. In order to directly test or completely avoid certain modeling assumptions, we contribute a computationally efficient adaptation of the model-X knockoff filter, which has distinctive advantages for TRN inference 36,44 (Box 2). ...
... 19 Thus, permutation is not expected to yield adequate knockoffs; however, we include this method due to the popularity of permutation methods for error control in TRN inference. 22,23,26,27,54 We provided the simulated data to the knockoff filter using only RNA expression levels ("RNA only") or revealing RNA expression, RNA production rate, and protein levels ("RNA + protein"). The latter captures the full state of the simulation. ...
Article
Full-text available
Inference of causal transcriptional regulatory networks (TRNs) from transcriptomic data suffers notoriously from false positives. Approaches to control the false discovery rate (FDR), for example, via permutation, bootstrapping, or multivariate Gaussian distributions, suffer from several complications: difficulty in distinguishing direct from indirect regulation, nonlinear effects, and causal structure inference requiring “causal sufficiency,” meaning experiments that are free of any unmeasured, confounding variables. Here, we use a recently developed statistical framework, model-X knockoffs, to control the FDR while accounting for indirect effects, nonlinear dose-response, and user-provided covariates. We adjust the procedure to estimate the FDR correctly even when measured against incomplete gold standards. However, benchmarking against chromatin immunoprecipitation (ChIP) and other gold standards reveals higher observed than reported FDR. This indicates that unmeasured confounding is a major driver of FDR in TRN inference. A record of this paper’s transparent peer review process is included in the supplemental information.
... Determining not only which genes have altered transcript abundance in response to stimuli but also inferring which of them function together in gene regulatory networks (GRNs) based on their responses bolsters our understanding of signaling pathways. Insight about GRNs downstream of stimuli can be extracted from time series data, which provides information about changes in transcript abundance over time (Ding & Bar-Joseph, 2020;Kimura et al., 2020;Lu et al., 2021;Sima et al., 2009;Yeung et al., 2011;Zhang et al., 2021). Genes that have relatively similar temporal dynamics form subnetworks that may be turned on by the same upstream transcription factors (TFs) or turned off by the same repression mechanisms. ...
Article
Full-text available
Transcriptome studies that provide temporal information about transcript abundance facilitate identification of gene regulatory networks (GRNs). Inferring GRNs from time series data using computational modeling remains a central challenge in systems biology. Commonly employed clustering algorithms identify modules of like-responding genes but do not provide information on how these modules are interconnected. These methods also require users to specify parameters such as cluster number and size, adding complexity to the analysis. To address these challenges, we employed a recently developed algorithm, Partitioned Local Depth (PaLD), to generate cohesive networks for 4 time series transcriptome datasets (3 hormone and 1 abiotic stress dataset) from the model plant Arabidopsis thaliana. PaLD provided a cohesive network representation of the data, revealing networks with distinct structures and varying numbers of connections between transcripts. We utilized the networks to make predictions about GRNs by examining local neighborhoods of transcripts with highly similar temporal responses. We also partitioned the networks into groups of like-responding transcripts and identified enriched functional and regulatory features in them. Comparison of groups to clusters generated by commonly used approaches indicated that these methods identified modules of transcripts that have similar temporal and biological features, but also identified unique groups, suggesting a PaLD-based approach (supplemented with a community detection algorithm) can complement existing methods. These results revealed that PaLD could sort like-responding transcripts into biologically meaningful neighborhoods and groups while requiring minimal user input and producing cohesive network structure, offering an additional tool to the systems biology community to predict GRNs.
... These fall into two broad categories. One type forms a background distribution by randomly permuting expression levels of a given gene across samples (Chasman et al., 2019), (Kimura, Fukutomi, Tokuhisa, & Okada, 2020;Morgan et al., 2019). This strategy implies the following strong null hypothesis: each gene is independent of all other genes unless it directly regulates them. ...
... The method labeled "permuted" randomly permuted samples within each gene (independent of the permutation applied to the other genes). Permutation implies very strict assumptions on the distribution of the features and is not expected to yield adequate knockoffs; we included it as an approximation of several existing permutation methods for error control in TRN inference (Chasman et al., 2019) (Kimura et al., 2020) (Morgan et al., 2019) (Verny, Sella, Affeldt, Singh, & Isambert, 2017. We provided the simulated data to the knockoff filter using only RNA expression levels ("RNA only") or revealing RNA expression, RNA production rate, and protein levels ("RNA + protein"). ...
... ; https://doi.org/10.1101/2023.05.23.541948 doi: bioRxiv preprint Discussion TRN inference methods are notorious for false positives (Diaz & Stumpf, 2022). For example, analyses using permuted genes as negative controls, which is sometimes claimed to control FDR in TRN inference (Chasman et al., 2019); (Kimura et al., 2020;Morgan et al., 2019), yielded 131 MelR targets spanning diverse biological functions. This conflicts with ChIP and perturbation experiments showing three or four targets of MelR, almost all located in the melibiose operon (Grainger et al., 2004;Wade et al., 2000). ...
Preprint
Full-text available
Computational biologists have long sought to automatically infer transcriptional regulatory networks (TRNs) from gene expression data, but such approaches notoriously suffer from false positives. Two points of failure could yield false positives: faulty hypothesis testing, or erroneous assumption of a classic criterion called causal sufficiency. We show that a recent statistical development, model-X knockoffs, can effectively control false positives in tests of conditional independence in mouse and E. coli data, which rules out faulty hypothesis tests. Yet, benchmarking against ChIP and other gold standards reveals highly inflated false discovery rates. This identifies the causal sufficiency assumption as a key limiting factor in TRN inference.
... Their method, GENIE3, won the DREAM4 in silico multifactorial challenge (https://dreamchallenges.org/). Researchers, inclusive of the author and colleagues, have been attracted by the possibilities of GENIE3, and have therefore developed its extensions [16,19,20,27,29]. ...
... Each added element corresponded to a stimulus applied to the cells. According to Kimura et al. [20], we considered the decomposition of the biochemical compounds used for stimulating the cells. One added element thus had a value of 48 0.9 t for the measurements in the time-series dataset obtained by applying the stimulus corresponding to the element, where t was the time (min.) ...
Article
Full-text available
Among the various methods so far proposed for genetic network inference, this study focuses on the random-forest-based methods. Confidence values are assigned to all of the candidate regulations when taking the random-forest-based approach. To our knowledge, all of the random-forest-based methods make the assignments using the standard variable importance measure defined in tree-based machine learning techniques. Therefore, the sum of the confidence values of the candidate regulations of a certain gene from the other genes, that are computed from a single random forest, is always restricted to a value of almost 1. We think that this feature is inconvenient for the genetic network inference that requires to compare the confidence values computed from multiple random forests. In this study we therefore propose an alternative measure, what we call ``the random-input variable importance measure,'' and design a new inference method that uses the proposed measure in place of the standard measure in the existing random-forest-based inference method. We show, through numerical experiments, that the use of the random-input variable importance measure improves the performance of the existing random-forest-based inference method by as much as 45.5% with respect to the area under the recall-precision curve (AURPC).
... Additionally, new technology providing expression at the single-cell level have led to the development of inference methods that are specifically adapted to single-cell transcriptomics (Chen and Mar, 2018). Inferring the functional relationships between genes requires the estimation of gene functional dependencies, which can be broadly achieved by two different reverse engineering approaches for GRN inference: 1. Model-free methods: In this approach, gene dependencies are inferred from using several statistical and machine learning methods, such as mutual information (Faith et al., 2007;Margolin et al., 2006;Meyer et al., 2007), random forest (Huynh-Thu et al., 2010;Kimura et al., 2020;Park et al., 2018;Petralia et al., 2015), deconvolution (Chen et al., 2014;Feizi et al., 2013), or epigenetic (Sonawane et al., 2021). 2. Model-based methods: In this approach, a quantitative dynamical model (for example, ODEs (Aalto et al., 2020;Huynh-Thu and Geurts, 2018;Iglesias-Martinez et al., 2016), regression methods (Michailidis and d'Alché-Buc, 2013), or bayesian reasoning (Young et al., 2014)) is defined to model the dynamical properties of the system, while the regulatory network is inferred from optimizing the model parameters based on the time-series data. ...
Preprint
Full-text available
Inference of gene regulatory networks has been an active area of research for around 20 years, leading to the development of sophisticated inference algorithms based on a variety of assumptions and approaches. With the always increasing demand for more accurate and powerful models, the inference problem remains of broad scientific interest. The abstract representation of biological systems through gene regulatory networks represents a powerful method to study such systems, encoding different amounts and types of information. In this review, we summarize the different types of inference algorithms specifically based on time-series transcriptomics, giving an overview of the main applications of gene regulatory networks in computational biology. This review is intended to give an updated overview of regulatory networks inference tools to biologists and researchers new to the topic and guide them in selecting the appropriate inference method that best fits their questions, aims and experimental data.
... and Tsai et al. (2020), all these devoted to identify cause-effect relationships in climatic phenomena, and at the present still not easily generalizable to the inference of biological networks of different kinds, due to the different physical nature of the climatic effects and the large variety of biological interactions. Nevertheless, on the same methodological line, there have also been important achievements in this direction in gene regulatory networks in the recent past, such as the development of a random forest algorithm for gene regulatory network inference by Petralia et al. (2015), Furqan and Siyal (2016), Deng et al. (2017), Huynh-Thu and Geurts (2018), Kimura et al. (2020), Zhang et al. (2020a), Cassan et al. (2021). The majority of the methods base on random forest for causal discovery in gene regulatory networks. ...
Article
Full-text available
Most machine learning-based methods predict outcomes rather than understanding causality. Machine learning methods have been proved to be efficient in finding correlations in data, but unskilful to determine causation. This issue severely limits the applicability of machine learning methods to infer the causal relationships between the entities of a biological network, and more in general of any dynamical system, such as medical intervention strategies and clinical outcomes system, that is representable as a network. From the perspective of those who want to use the results of network inference not only to understand the mechanisms underlying the dynamics, but also to understand how the network reacts to external stimuli (e. g. environmental factors, therapeutic treatments), tools that can understand the causal relationships between data are highly demanded. Given the increasing popularity of machine learning techniques in computational biology and the recent literature proposing the use of machine learning techniques for the inference of biological networks, we would like to present the challenges that mathematics and computer science research faces in generalising machine learning to an approach capable of understanding causal relationships, and the prospects that achieving this will open up for the medical application domains of systems biology, the main paradigm of which is precisely network biology at any physical scale.
... Each added element corresponded to a stimulus applied to the cells. According to Kimura et al. [19], we considered the decomposition of the biochemical compounds used for stimulating the cells. One added element thus had a value of 0.9 t 48 for the measurements in the time-series dataset obtained by applying the stimulus corresponding to the element, where t was the time (min.) ...
Preprint
Full-text available
Background: Among the various methods so far proposed for genetic network inference, this study focuses on the random-forest-based methods. Confidence values are assigned to all of the candidate regulations when taking the random-forest-based approach. To our knowledge, all of the random-forest-based methods make the assignments using the standard variable importance measure defined in tree-based machine learning techniques. We think however that this measure has drawbacks in the inference of genetic networks. Results: In this study we therefore propose an alternative measure, what we call ``the random-input variable importance measure,'' and design a new inference method that uses the proposed measure in place of the standard measure in the existing random-forest-based inference method. We show, through numerical experiments, that the use of the random-input variable importance measure improves the performance of the existing random-forest-based inference method by as much as 45.5% with respect to the area under the recall-precision curve (AURPC). Conclusion: This study proposed the random-input variable importance measure for the inference of genetic networks. The use of our measure improved the performance of the random-forest-based inference method. In this study, we checked the performance of the proposed measure only on several genetic network inference problems. However, the experimental results suggest that the proposed measure will work well in other applications of random forests.
Article
Full-text available
Inference of gene regulatory networks has been an active area of research for around 20 years, leading to the development of sophisticated inference algorithms based on a variety of assumptions and approaches. With the ever increasing demand for more accurate and powerful models, the inference problem remains of broad scientific interest. The abstract representation of biological systems through gene regulatory networks represents a powerful method to study such systems, encoding different amounts and types of information. In this review, we summarize the different types of inference algorithms specifically based on time-series transcriptomics, giving an overview of the main applications of gene regulatory networks in computational biology. This review is intended to give an updated reference of regulatory networks inference tools to biologists and researchers new to the topic and guide them in selecting the appropriate inference method that best fits their questions, aims, and experimental data.