Shuhei Kimura’s research while affiliated with Tottori University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (59)


A Bias in Training Data Degrades the Quality of Variable Importance Scores of Random Forests
  • Conference Paper

November 2024

·

3 Reads

Shuhei Kimura

·

Kenta Yuasa

·



Figure 3. The values for the weight parameters, T k w ' s, corresponding to each of the timeseries datasets of the DREAM3 problems each internal node of each tree, and the maximum height of each tree were set to 1000, 1 3
Figure 5. The time-series of expression levels of a) ATF3, b) EGR1, c) EGR2, d) EGR3, e) ETS2, f) FOS, g) FOSB, h) FOSL1, i) JUN, j) JUNB and k) MYC in MCF7 cells stimulated by HRG Solid line: smoothed expression data used for inferring genetic networks. Plus symbol: measured gene expression data.
Figure 6. The network of the top 30 regulations obtained from the proposed approach Solid lines represent the top 20 regulations. Circles and squares represent the genes and external stimuli, respectively.
Figure 7. The network of the top 30 regulations obtained from the original method [19]
Inference of genetic networks using random forests:Performance improvement using a new variable importance measure
  • Article
  • Full-text available

December 2022

·

35 Reads

·

1 Citation

Chem-Bio Informatics Journal

Among the various methods so far proposed for genetic network inference, this study focuses on the random-forest-based methods. Confidence values are assigned to all of the candidate regulations when taking the random-forest-based approach. To our knowledge, all of the random-forest-based methods make the assignments using the standard variable importance measure defined in tree-based machine learning techniques. Therefore, the sum of the confidence values of the candidate regulations of a certain gene from the other genes, that are computed from a single random forest, is always restricted to a value of almost 1. We think that this feature is inconvenient for the genetic network inference that requires to compare the confidence values computed from multiple random forests. In this study we therefore propose an alternative measure, what we call ``the random-input variable importance measure,'' and design a new inference method that uses the proposed measure in place of the standard measure in the existing random-forest-based inference method. We show, through numerical experiments, that the use of the random-input variable importance measure improves the performance of the existing random-forest-based inference method by as much as 45.5% with respect to the area under the recall-precision curve (AURPC).

Download



Figures
The top 20 regulations ranked with respect to the confidence values computed by the proposed approach and the original inference method [6]. The regulations written in boldface and italic fonts have reportedly been confirmed in human and/or other species and are accordingly assumed to be reasonable. Rank Inference method using Random-forest-based inference method [6] random-input variable importance measure 1 EGR1 ← FOS EGR1 ← FOS 2 ATF3 ← TGF-β and TNF-α FOS ← HRG 3 EGR2 ← FOS ATF3 ← TGF-β and TNF-α
Inference of genetic networks using random forests: performance improvement using a new variable importance measure

July 2021

·

48 Reads

Background: Among the various methods so far proposed for genetic network inference, this study focuses on the random-forest-based methods. Confidence values are assigned to all of the candidate regulations when taking the random-forest-based approach. To our knowledge, all of the random-forest-based methods make the assignments using the standard variable importance measure defined in tree-based machine learning techniques. We think however that this measure has drawbacks in the inference of genetic networks. Results: In this study we therefore propose an alternative measure, what we call ``the random-input variable importance measure,'' and design a new inference method that uses the proposed measure in place of the standard measure in the existing random-forest-based inference method. We show, through numerical experiments, that the use of the random-input variable importance measure improves the performance of the existing random-forest-based inference method by as much as 45.5% with respect to the area under the recall-precision curve (AURPC). Conclusion: This study proposed the random-input variable importance measure for the inference of genetic networks. The use of our measure improved the performance of the random-forest-based inference method. In this study, we checked the performance of the proposed measure only on several genetic network inference problems. However, the experimental results suggest that the proposed measure will work well in other applications of random forests.


A framework of the proposed method.
The measurement conditions of the time-series datasets used in this study.
Inference of Genetic Networks From Time-Series and Static Gene Expression Data: Combining a Random-Forest-Based Inference Method With Feature Selection Methods

December 2020

·

51 Reads

·

13 Citations

Several researchers have focused on random-forest-based inference methods because of their excellent performance. Some of these inference methods also have a useful ability to analyze both time-series and static gene expression data. However, they are only of use in ranking all of the candidate regulations by assigning them confidence values. None have been capable of detecting the regulations that actually affect a gene of interest. In this study, we propose a method to remove unpromising candidate regulations by combining the random-forest-based inference method with a series of feature selection methods. In addition to detecting unpromising regulations, our proposed method uses outputs from the feature selection methods to adjust the confidence values of all of the candidate regulations that have been computed by the random-forest-based inference method. Numerical experiments showed that the combined application with the feature selection methods improved the performance of the random-forest-based inference method on 99 of the 100 trials performed on the artificial problems. However, the improvement tends to be small, since our combined method succeeded in removing only 19% of the candidate regulations at most. The combined application with the feature selection methods moreover makes the computational cost higher. While a bigger improvement at a lower computational cost would be ideal, we see no impediments to our investigation, given that our aim is to extract as much useful information as possible from a limited amount of gene expression data.




Service-Oriented Utterance Sentence Analysis in Small Computers小型計算機におけるサービス指向発話文解析

September 2019

·

5 Reads

Journal of Natural Language Processing

This paper addresses a method to analyze command utterance sentences in small computers providing various services. All compulsory command utterances for the services must be accepted, and the utterances by the user must be able to be learned at a low computational cost. So as to determine the chunks and their sense based on their service, our proposal method contains parsing arrays for each service and performs utterance analysis and learning through reinforcement learning. Using the results of an in-vehicle computer implementation for a car traveler, it was confirmed that the proposed method was generally successful, with an analysis accuracy of 0.99 in a closed test and 0.81 in an open test.


Citations (37)


... Many different tools have been developed to infer interactions in gene regulatory networks [18], [11], [14], [12], [3], [16], and it is important to assess the reliability of their predictions. This task was realized by the DREAM (Dialogue for Reverse Engineering Assessments and Methods) [23] competitions via standardized benchmarking measures of network inference performance. ...

Reference:

Boosting reliability when inferring interactions from time series data in gene regulatory networks
Inference of genetic networks using random forests:Performance improvement using a new variable importance measure

Chem-Bio Informatics Journal

... where ( ) [19] showed that our random-forest-based inference method performs better when the constant parameters T k w ' s and S k w ' s are appropriately set. In order to determine these values, then, the methods that utilize the similarity between measurements have been proposed [19,21]. ...

Inference of Genetic Networks using Random Forests: A Quantitative Weighting Method for Gene Expression Data
  • Citing Conference Paper
  • August 2022

... The main purpose of the existing feature selection methods might explain this failure, as the methods were developed not to detect all of the input variables that actually affect the output, but to find input variables that maximize the predicting performance of the obtained model. More recently, our group developed a new feature selection method whose purpose is to find all of the input variables that actually affect the output and to remove as many of the irrelevant input variables as possible (Kimura and Tokuhisa, 2020). ...

Detection of Weak Relevant Variables using Random Forests
  • Citing Conference Paper
  • September 2020

... 21 A general strategy for handling a preponderance of false positives is to statistically filter TRN outputs to achieve a user-specified precision, typically expressed in terms of false discovery rate (FDR) control. [22][23][24][25][26][27] In brief, the FDR is the proportion of significant predictions that are expected to be false positives, and there are various ways to estimate the FDR of TRN inferences (see Box 1). If FDR estimates are accurate, then the user can generate a TRN in which only a specified fraction of edges are false positives. ...

Inference of Genetic Networks From Time-Series and Static Gene Expression Data: Combining a Random-Forest-Based Inference Method With Feature Selection Methods

... Then select a more appropriate model for the next step of analysis. The "ran-domForest" R package was applied to evaluate the significance of m6A regulators in POAG [18]. Genes with gene importance score greater than 2 were selected for subsequent analysis, and the corresponding gene importance score plot was drawn. ...

Inference of Genetic Networks using Random Forests: Assigning Different Weights for Gene Expression Data
  • Citing Article
  • April 2019

Journal of Bioinformatics and Computational Biology

... Our results also highlight Aatk, Aspa,Nov/Ccn3,Cebpd,Fam198b,Gpr155,Ngfr,Npr3,Phlda1 and Rnf130 as potential regulators of peripheral glial phenotype (Mersmann et al. 2011;Reiprich et al. 2017;Magi et al. 2018;de la Vega Gallardo et al. 2020). Nov expression has been shown to be induced in differentiating enteric progenitor cells (Neckel et al. 2016). ...

Transcriptionally inducible Pleckstrin homology-like domain family A member 1 attenuates ErbB receptor activity by inhibiting receptor oligomerization

Journal of Biological Chemistry

·

Kazunari Iwamoto

·

Noriko Yumoto

·

[...]

·

... An efficient method for inferring undirected Gaussian graphical models is described in Yuan and Lin (2007). More recently, a detailed comparison of various methods for static network inference has been carried out in Kimura et al. (2017). Such methods do not have to be based on microarray data. ...

Inference of genetic networks from time-series of gene expression levels using random forests
  • Citing Conference Paper
  • August 2017

... Our group recently proposed an inference method that uses another kind of a priori knowledge, i.e., a hierarchical structure [8]. The first step in this hierarchy-based method [8] is to obtain multiple genetic networks from the given gene expression data using the BS-LPM inference method [9], a combination of the existing inference method [10] and a bootstrap method [11]. ...

Genetic Network Inference Using Hierarchical Structure

... To solve this problem, BioMASS enables integrative evaluation of parameter estimation and sensitivity analysis of the dozens of parameter candidates obtained by data fitting. We implemented a genetic algorithm [30][31][32][33], one of the efficient methods for solving global optimization problems for the data fitting of the model. After model fitting, users can perform sensitivity analysis to identify critical parameters, species or regulations in the system of interest. ...

An Extension of UNDX Based on Guidelines for Designing Crossover Operators
  • Citing Article
  • January 2000

Transactions of the Society of Instrument and Control Engineers