# Hidetoshi ShimodairaKyoto University | Kyodai · Graduate School of Informatics

Hidetoshi Shimodaira

## About

112

Publications

27,297

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

14,667

Citations

Citations since 2017

## Publications

Publications (112)

Measuring the semantic similarity between two sentences is still an important task. The word mover's distance (WMD) computes the similarity via the optimal alignment between the sets of word embeddings. However, WMD does not utilize word order, making it difficult to distinguish sentences with large overlaps of similar words, even if they are seman...

It is common to show the confidence intervals or p-values of selected features, or predictor variables in regression, but they often involve selection bias. The selective inference approach solves this bias by conditioning on the selection event. Most existing studies of selective inference consider a specific algorithm, such as Lasso, for feature...

Temporal datasets that describe complex interactions between individuals over time are increasingly common in various domains. Conventional graph representations of such datasets may lead to information loss since higher-order relationships between more than two individuals must be broken into multiple pairwise relationships in graph representation...

For supervised classification problems, this paper considers estimating the query's label probability through local regression using observed covariates. Well-known nonparametric kernel smoother and $k$-nearest neighbor ($k$-NN) estimator, which take label average over a ball around the query, are consistent but asymptotically biased particularly f...

Preferential attachment is commonly invoked to explain the emergence of those heavy-tailed degree distributions characteristic of growing network representations of diverse real-world phenomena. Experimentally confirming this hypothesis in real-world growing networks is an important frontier in network science research. Conventional preferential at...

It is well-known that typical word embedding methods such as Word2Vec and GloVe have the property that the meaning can be composed by adding up the embeddings (additive compositionality). Several theories have been proposed to explain additive compositionality, but the following questions remain unanswered: (Q1) The assumptions of those theories do...

Preferential attachment is commonly invoked to explain the emergence of those heavy-tailed degree distributions characteristic of growing network representations of diverse real-world phenomena. Experimentally confirming this hypothesis in real-world growing networks is an important frontier in network science research. Conventional preferential at...

Refining one’s hypotheses in the light of data is a common scientific practice; however, the dependency on the data introduces selection bias and can lead to specious statistical analysis. An approach for addressing this is via conditioning on the selection procedure to account for how we have used the data to generate our hypotheses, and prevent i...

We propose a statistical method for estimating the non-parametric transitivity and preferential attachment functions simultaneously in a growing network, in contrast to conventional methods that either estimate each function in isolation or assume a certain functional form for these. Our model is demonstrated to exhibit a good fit to two real-world...

Multimodal relational data analysis has become of increasing importance in recent years, for exploring across different domains of data, such as images and their text tags obtained from social networking services (e.g., Flickr). A variety of data analysis methods have been developed for visualization; to give an example, t-Stochastic Neighbor Embed...

A collection of U(∈N) data vectors is called a U-tuple, and the association strength among the vectors of a tuple is termed as the hyperlink weight, that is assumed to be symmetric with respect to permutation of the entries in the index. We herein propose Bregman hyperlink regression (BHLR), which learns a user-specified symmetric similarity functi...

Many real-world systems are profitably described as complex networks that grow over time. Preferential attachment and node fitness are two simple growth mechanisms that not only explain certain structural properties commonly observed in real-world systems, but are also tied to a number of applications in modeling and inference. While there are stat...

k$-nearest neighbour ($k$-NN) is one of the simplest and most widely-used methods for supervised classification, that predicts a query's label by taking weighted ratio of observed labels of $k$ objects nearest to the query. The weights and the parameter $k \in \mathbb{N}$ regulate its bias-variance trade-off, and the trade-off implicitly affects th...

Refining one's hypotheses in the light of data is a commonplace scientific practice, however, this approach introduces selection bias and can lead to specious statistical analysis. One approach of addressing this phenomena is via conditioning on the selection procedure, i.e., how we have used the data to generate our hypotheses, and prevents inform...

We propose a statistical method to estimate simultaneously the non-parametric transitivity and preferential attachment functions in a growing network, in contrast to conventional methods that either estimate each function in isolation or assume some functional form for them. Our model is shown to be a good fit to two real-world co-authorship networ...

We propose weighted inner product similarity (WIPS) for neural network-based graph embedding. In addition to the parameters of neural networks, we optimize the weights of the inner product by allowing positive and negative values. Despite its simplicity, WIPS can approximate arbitrary general similarities including positive definite, conditionally...

A collection of $U \: (\in \mathbb{N})$ data vectors is called a $U$-tuple, and the association strength among the vectors of a tuple is termed as the hyperlink weight, that is assumed to be symmetric with respect to permutation of the entries in the index. We herein propose Bregman hyperlink regression (BHLR), which learns a user-specified symmetr...

We propose a new type of representation learning method that models words, phrases and sentences seamlessly. Our method does not depend on word segmentation and any human-annotated resources (e.g., word dictionaries), yet it is very effective for noisy corpora written in unsegmented languages such as Chinese and Japanese. The main idea of our metho...

A general resampling approach is considered for selective inference problem after variable selection in regression analysis. Even after variable selection, it is important to know whether the selected variables are actually useful by showing $p$-values and confidence intervals of regression coefficients. In the classical approach, significance leve...

Selective inference is considered for testing trees and edges in phylogenetic tree selection from molecular sequences. This improves the previously proposed approximately unbiased test by adjusting the selection bias when testing many trees and edges at the same time. The newly proposed selective inference p-value is useful for testing selected edg...

Statistical inference is considered for variables of interest, called primary variables, when auxiliary variables are observed along with the primary variables. We consider the setting of incomplete data analysis, where some primary variables are not observed. Utilizing a parametric model of joint distribution of primary and auxiliary variables, it...

We propose $\textit{weighted inner product similarity}$ (WIPS) for neural-network based graph embedding, where we optimize the weights of the inner product in addition to the parameters of neural networks. Despite its simplicity, WIPS can approximate arbitrary general similarities including positive definite, conditionally positive definite, and in...

We propose $\beta$-graph embedding for robustly learning feature vectors from data vectors and noisy link weights. A newly introduced empirical moment $\beta$-score reduces the influence of contamination and robustly measures the difference between the underlying correct expected weights of links and the specified generative model. The proposed met...

Statistical inference is considered for variables of interest, called primary variables, when auxiliary variables are observed along with the primary variables. We consider the setting of incomplete data analysis, where some primary variables are not observed. Utilizing a parametric model of joint distribution of primary and auxiliary variables, it...

Selective inference is considered for testing trees and edges in phylogenetic tree selection from molecular sequences. This improves the previously proposed approximately unbiased test by adjusting the selection bias when testing many trees and edges at the same time. The newly proposed selective inference $p$-value is useful for testing selected e...

We propose shifted inner-product similarity (SIPS), which is a novel yet very simple extension of the ordinary inner-product similarity (IPS) for neural-network based graph embedding (GE). In contrast to IPS, that is limited to approximating positive-definite (PD) similarities, SIPS goes beyond the limitation by introducing bias terms in IPS; we th...

Applying conventional word embedding models to unsegmented languages, where word boundary is not clear, requires word segmentation as preprocessing. However, word segmentation is difficult and expensive to conduct without errors. Segmentation error degrades the quality of word embeddings, leading to performance degradation in downstream application...

Many real-world systems are profitably described as complex networks that grow over time. Preferential attachment and node fitness are two simple growth mechanisms that not only explain certain structural properties commonly observed in real-world systems, but are also tied to a number of applications in modeling and inference. While there are stat...

We propose a method for the non-parametric joint estimation of preferential attachment and transitivity in complex networks, as opposite to conventional methods that either estimate one mechanism in isolation or jointly estimate both assuming some functional forms. We apply our method to three scientific co-authorship networks between scholars in t...

(This manuscript was presented at ICML2018 Theoretical Foundations and Applications of Deep Generative Models (TADGM) workshop, Stockholm, Sweden, July 14-15, 2018. )
The representation power of similarity functions used in neural network-based graph embedding is considered. The inner product similarity (IPS) with feature vectors computed via neur...

A simple framework Probabilistic Multi-view Graph Embedding (PMvGE) is proposed for multi-view feature learning with many-to-many associations. PMvGE is a probabilistic model for predicting new associations via graph embedding of the nodes of data vectors with links of their associations.Multi-view data vectors with many-to-many associations are tr...

Selective inference procedures are considered for computing approximately unbiased $p$-values of hypothesis testing using nonparametric bootstrap resampling without direct access to the parameter space nor the null distribution. A typical example is to assess the uncertainty of hierarchical clustering, where we can easily compute a frequentist conf...

Complex network growth across diverse fields of science is hypothesized to be driven in the main by a combination of preferential attachment and node fitness processes. For measuring the respective influences of these processes, previous approaches make strong and untested assumptions on the functional forms of either the preferential attachment fu...

We introduce a statistically sound method called PAFit for the joint estimation of preferential attachment and node fitness in temporal complex networks. Together these mechanisms play a crucial role in shaping network topology by governing the way in which nodes acquire new edges over time. PAFit is an advance over previous methods in so far as it...

We consider the problem of sparse estimation of undirected graphical models via the L1 regularization. The ordinary lasso encourages the sparsity on all edges equally likely, so that all nodes tend to have small degrees. On the other hand, many real-world networks are often scale-free, where some nodes have a large number of edges. In such cases, a...

Preferential attachment is a stochastic process that has been proposed to explain certain topological features characteristic of complex networks from diverse domains. The systematic investigation of preferential attachment is an important area of research in network science, not only for the theoretical matter of verifying whether this hypothesize...

We derive an information criterion for selecting a parametric model of
complete-data distribution when only incomplete or partially observed data is
available. Compared with AIC, the new criterion has an additional penalty term
for missing data expressed by the Fisher information matrices of complete data
and incomplete data. We prove that the new...

The strength of association between a pair of data vectors is represented by
a nonnegative real number, called matching weight. For dimensionality
reduction, we consider a linear transformation of data vectors, and define a
matching error as the weighted sum of squared distances between transformed
vectors with respect to the matching weights. Give...

Data vectors are obtained from multiple domains. They are feature vectors of
images or vector representations of words. Domains may have different numbers
of data vectors with different dimensions. These data vectors from multiple
domains are projected to a common space by linear transformations in order to
search closely related vectors across dom...

In several empirical applications analyzing customer-by-product choice data, it may be relevant to partition individuals having similar purchase behavior in homogeneous segments. Moreover, should individual- and/or product-specific covariates be available, ...

Background: In this paper we report the prevalence of copy number aberration events at various stages
(subclasses) of breast cancer as assessed by two different statistical methods, GISTIC, a well-known

We consider hypothesis testing for the null hypothesis being represented as
an arbitrary-shaped region in the parameter space. We compute an approximate
p-value by counting how many times the null hypothesis holds in bootstrap
replicates. This frequency, known as bootstrap probability, is widely used in
evolutionary biology, but often reported as b...

Preferential attachment is widely recognised as the principal driving force behind the evolution of many growing networks, and measuring the extent to which it occurs during the growth of a network is important for explaining its overall structure. Conventional methods require that the timeline of a growing network is known, that is, the order in w...

We propose multiscale bagging as a modification of the bagging procedure. In ordinary bagging, the bootstrap resampling is used for generating bootstrap samples. We replace it with the multiscale bootstrap algorithm. In multiscale bagging, the sample size in of bootstrap samples may be altered from the sample size n of learning dataset. For assessi...

This file contains the gene expression data that we analyzed in our paper.
(0.05 MB TXT)

The problem of reconstructing large-scale, gene regulatory networks from gene expression data has garnered considerable attention in bioinformatics over the past decade with the graphical modeling paradigm having emerged as a popular framework for inference. Analysis in a full Bayesian setting is contingent upon the assignment of a so-called struct...

Structural equation models have been widely used to study causal relationships between continuous variables. Recently, a non-
Gaussian method called LiNGAM was proposed to discover such causal models and has been extended in various directions. An
important problem with LiNGAM is that the results are affected by the random sampling of the data as w...

scaleboot is an add-on package for R. It is for calculating approximately unbi-ased (AU) p-values for a general problem from a set of multiscale bootstrap probabil-ities (BPs). Scaling is equivalent to changing the sample size of a data in bootstrap resampling. We compute BPs at several scales, from which a very accurate p-value is calculated (Shim...

Structural equation models and Bayesian networks have been widely used to study causal relationships between continuous variables. Recently, a non-Gaussian method called LiNGAM was proposed to discover such causal models and has been extended in various directions. An important problem with LiNGAM is that the results are affected by the random samp...

This paper concerns the specification, and performance, of scale-free prior distributions with a view toward large-scale network
inference from small-sample data sets. We devise three scale-free priors and implement them in the framework of Gaussian graphical
models. Gaussian graphical models are used in gene network inference where high-throughput...

This chapter examines learning algorithms under the covariate shift in which training and test data are drawn from different distributions. Using a naive estimator under the covariate shift, such as the maximum likelihood estimator (MLE), results in serious estimation bias when the assumed statistical model is misspecified. To correct this estimati...

A new computation method of frequentist $p$-values and Bayesian posterior probabilities based on the bootstrap probability is discussed for the multivariate normal model with unknown expectation parameter vector. The null hypothesis is represented as an arbitrary-shaped region. We introduce new parametric models for the scaling-law of bootstrap pro...

A new class of approximately unbiased tests based on bootstrap probabilities is obtained for the multivariate normal model with unknown expectation parameter vector. The null hypothesis is represented as an arbitrary-shaped region with possibly nonsmooth boundary surfaces such as cones, which appear in, for example, multiple comparisons and hierarc...

We propose a scale-free network model with a tunable power-law exponent. The Poisson growth model, as we call it, is an offshoot
of the celebrated model of Barabási and Albert where a network is generated iteratively from a small seed network; at each
step a node is added together with a number of incident edges preferentially attached to nodes alr...

ErbB receptor ligands, epidermal growth factor (EGF) and heregulin (HRG), induce dose-dependent transient and sustained intracellular signaling, proliferation, and differentiation of MCF-7 breast cancer cells, respectively. In an effort to delineate the ligand-specific cell determination mechanism, we investigated time course gene expressions induc...

Pvclust is an add-on package for a statistical software R to assess the uncertainty in hierarchical cluster analysis. Pvclust
can be used easily for general statistical problems, such as DNA microarray analysis, to perform the bootstrap analysis of
clustering, which has been popular in phylogenetic analysis. Pvclust calculates probability values (p...

A class of approximately unbiased tests based on bootstrap probabilities is considered for the normal model with unknown expectation parameter vector, where the null hypothesis is represented as an arbitrary-shaped region with possibly singular boundary surfaces. We alter the sample size n of replicated datasets from the sample size n of the observ...

Approximately unbiased tests based on bootstrap probabilities are considered for the exponential family of distributions with unknown expectation parameter vector, where the null hypothesis is represented as an arbitrary-shaped region with smooth boundaries. This problem has been discussed previously in Efron and Tibshirani [Ann. Statist. 26 (1998)...

Time-and dose-dependent change in gene expression profiles of ligand-stimulated cancer cells is promis-ing source of information to unwire a relationship between biological activities of the molecules and disease states [1]. In this study, we employed a multiplicative model [2] to decompose the time and dose efiects, instead of the ordinary additiv...

The bootstrap probability (BP), the frequency of bootstrap replicates supporting a hypothesis, is often used as a confidence level in complicated data analysis. The hierarchical clustering analysis is a typical example; the BP value is calculated for each cluster by counting how many times that the cluster appeared in thousands of bootstrapped tree...

To construct an East Asia mitochondrial DNA (mtDNA) phylogeny, we sequenced the complete mitochondrial genomes of 672 Japanese individuals (http://www.giib.or.jp/mtsnp/index_e.html). This allowed us to perform a phylogenetic analysis with a pool of 942 Asiatic sequences. New clades and subclades emerged from the Japanese data. On the basis of this...

Introduction The Bayesian network [2, 3, 4] is a very powerful tool for estimating the gene network from microarray expression profiles. The estimated network is often susceptible to statistical sampling error, and thus Imoto et al. [3, 4] evaluated the reliability of estimation by calculating the bootstrap probabilities for the edges connecting ge...

Supplement Material to "Approximately unbiased tests of regions using multistep-multiscale bootstrap resampling." 1 Summary The technical details of the new bootstrap method of Shimodaira (2004) are given here in mathematical proofs as well as a supporting computer program. Approximately unbiased tests based on the bootstrap probabilities are consi...

The maximum likelihood method is considered as one of the most reliable methods for phylogenetic tree inference. However, as the number of species increases, the approach quickly loses its applicability due to explosive exponential number of trees that need to be considered. An earlier work by one of the authors (3) demonstrated that, by decomposin...

We study the problems of constructing designs for the regression problems. Our aim is to estimate the mean value of the response variable. The distribution of the independent variable is appropriately chosen from among the continuous designs so as to decrease the integrated mean square error (IMSE) of the fitted values. When we use the design, we f...

inal gene expression data X n = (x1; : : : ; xn). In other words, we alter the number of arrays from n to n0 in the bootstrap replication. We will take n0 values with n0=n = 0.5,0.6,0.7,0.8,0.9,1.0,1.1,1.2,1.3,1.4, in the example shown later. We call = p n=n0 scale. The bootstrap algorithm with n0 arrays can be expressed as follows: Step1: Generate...

An approximately unbiased (AU) test that uses a newly devised multiscale bootstrap technique was developed for general hypothesis
testing of regions in an attempt to reduce test bias. It was applied to maximum-likelihood tree selection for obtaining the
confidence set of trees. The AU test is based on the theory of Efron et al. (Proc. Natl. Acad. S...

Consider the balanced one-way layout for comparing k treatment effects μi, 1⩽i⩽k. Marcus (Biometrika 63 (1976) 177) considers a test procedure for testing the null hypothesis against the simple ordered alternative with at least one strict inequality based upon the range of the isotonic estimates of the treatment effects. This test statistic was ori...

Unlabelled:
CONSEL is a program to assess the confidence of the tree selection by giving the p-values for the trees. The main thrust of the program is to calculate the p-value of the Approximately Unbiased (AU) test using the multi-scale bootstrap technique. This p-value is less biased than the other conventional p-values such as the Bootstrap Pro...

We consider multiple comparisons of log-likelihood's to take account of the multiplicity of testings in selection of nonnested models. A resampling version of the Gupta procedure for the selection problem is used to obtain a set of good models, which are not significantly worse than the maximum likelihood model; i.e., a confidence set of models. Ou...