# Yasuo MatsuyamaWaseda University | Sōdai · Comouter Scence and Engineering

Yasuo Matsuyama

Dr. Engineering and Ph. D.

## About

101

Publications

9,276

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

1,000

Citations

Citations since 2017

Introduction

statistical machine learning, blockchain, neural networks, information theory

Additional affiliations

April 1996 - present

## Publications

Publications (101)

This study interrelates three adjacent topics in data evaluation. The first is the establishment of a relationship between Bregman divergence and probabilistic alpha-divergence. In particular, we demonstrate that square-root-order probability normalization enables the unification of these two divergence families. This yields a new alpha-divergence,...

We present a path starting from generalized infor-mation measures to blockchain applications. In the middle of these two endpoints, we derive the alpha-EM algorithm and the traditional log-EM algorithm simultaneously from a sole divergence. Thus, there are three subjects. The first part dis-cusses the relationship between the Bregman divergence and...

In this paper, we present machine learning algorithms and systems for similar video retrieval. Here, the query is itself a video. For the similarity measure-ment, exemplars, or representative frames in each video, are extracted by un-supervised learning. For this learning, we chose the order-aware competitive learning. After obtaining a set of exem...

The estimation of generative structures for sequences is becoming increasingly important for preventing such data sources from becoming a flood of disorganized information. Obtaining Hidden Markov models (HMMs) has been a central method for structuring such data. However, users have been aware of the slow speed of this algorithm. In this study, we...

Learning algorithms that harmonize standardized video similarity tools and an integrated system are presented. The learning algorithms extract exemplars reflecting time courses of video frames. There were five types of such clustering methods. Among them, this paper chooses a method called time-partition pairwise nearest-neighbor because of its red...

In this paper, we propose a method to utilize low-frequency brain signals for continuous authentication of users. During such monitoring, the users to be authenticated can work without interruption. This style of authentication is expected to complement traditional methods based on passwords, which can be easily forgotten or stolen. For brain signa...

New learning algorithms and systems for retrieving similar videos are presented. Each query is a video itself. For each video, a set of exemplars is machine-learned by new algorithms. Two methods were tried. The first and main one is the time-bound affinity propagation. The second is the harmonic competition which approximates the first. In the sim...

A new icon spotting method for designing a user-friendly GUI is described. Here, each icon can represent continuous and discrete vector data which are possibly high-dimensional. An important issue is icon-margin adjustment or uniforming while the relative positioning is maintained. For generating such GUI, multidimensional scaling, kernel principal...

A new approach to continuous authentication is presented. The method is based on a combination of statistical decision machines for brain signals. Functional Near InfraRed Spectroscopy (NIRS) is used to measure brain oxyhemoglobin changes for each subject to be authenticated. Such biosignal authentication is expected to be a viable complementary me...

Mapping tools applicable to big data of composite elements are designed based on a machine learning approach. The central method adopted is the multi-dimensional scaling (MDS). The data set is mapped onto a continuous surface such as a sphere. For checking to see the effectiveness of this method, preliminary experiments on the local optimality were...

Multimodal signals emanated from human users are applied to operations of bipedal humanoids. Distinctive features of the designed system include recognition and conversion of sensibilities as patterns contained in the biosignals. The total recognition system is a combination of Bayesian networks, hidden Markov models, independent component analysis...

A class of rapid algorithms for independent component analysis (ICA) is presented. This method utilizes multi-step past information with respect to an existing fixed-point style for increasing the non-Gaussianity. This can be viewed as the addition of a variable-size momentum term. The use of past information comes from the idea of surrogate optimi...

New algorithms for joint learning of independent component analysis and graphical high-order correlation (GC-ICA: Graphically Correlated ICA) are presented. The presented method has a fixed point style or of the FastICA, however, it comprises independent but correlated subparts. Correlations by teacher signals are also allowed. In spite of such inc...

Fast estimation algorithms for Hidden Markov models (HMMs) for given data are presented. These algorithms start from the alpha-EM algorithm which includes the traditional log-EM as its proper subset. Since existing or traditional HMMs are the outcome of the log-EM, it had been expected that the alpha-HMM would exist. In this paper, it is shown that...

Fast estimation algorithms of Hidden Markov Models (HMMs), or alpha-HMMs, are presented. Such novel algorithms inherit speedup properties of the alpha-EM algorithm. Since the alpha-EM algorithm includes the traditional log-EM algorithm as its special case, the alpha-HMM also includes the traditional log-HMM as its special case. This generalization...

Human-humanoid symbiosis by using brain signals is presented. Humans issue two types of brain signals. One is non-invasive
NIRS giving oxygenated hemoglobin concentration change and tissue oxygeneration index. The other is a set of neural spike
trains (measured on macaques for safety compliance). In addition to such brain signals, human motions are...

A fast learning algorithm for Hidden Markov Models is derived starting from convex divergence optimization. This method utilizes the alpha-logarithm as a surrogate function for the traditional logarithm to process the likelihood ratio. This enables the utilization of a stronger curvature than the logarithm. This paper's method includes the ordinary...

A faster algorithm of ICA is devised. This method utilizes multi-step past information with the existing fixed-point method. The use of past information comes from the idea of surrogate optimization. The speed and implemented software are checked for both simulated and read data. The presented ICA, named Rapid ICA, is applied to image-to-image retr...

Heterogeneous bio-signals including human motions, brain NIRS and neural spike trains are utilized for operating biped humanoids. The Bayesian network comprising Hidden Markov Models and Support Vector Machines is designed for the signal integration. By this method, the system complexity is reduced so that that total operation is within the scope o...

Methods to integrate multimodal beliefs by Bayesian Networks (BNs) comprising Hidden Markov Models (HMMs) and Support Vector
Machines (SVMs) are presented. The integrated system is applied to the operation of ambulating PCs (biped humanoids) across
the network. New features in this paper are twofold. First, the HMM/SVM-embedded BN for the multimod...

Joint recognition of bio-signals emanated from human(s) is discussed. The bio-signals in this paper include camera-captured
gestures and brain signals of hemoglobin change ΔO
2
H
b
. The recognition of the integrated data is applied to the operation of a biped humanoid. Hidden Markov Models (HMMs) and
Support Vector Machines (SVMs) undertake the f...

Sensibility-aware image retrieval methods are presented and their performances are compared. Three systems are discussed in
this paper: PCA/ICA-based method called RIM (Retrieval-aware IMage format), JPEG, and JPEG2000. In each case, a query is an
image per se. Similar images are retrieved to this query. The RIM method is judged to be the best sett...

Protein folding classification is a meaningful step to improve analysis of the whole structures. We have designed committee
Support Vector Machines (committee SVMs) and their array (committee SVM array) for the prediction of the folding classes.
Learning and test data are amino acid sequences drawn from SCOP (Structure Classification Of Protein dat...

Image-to-image retrieval (I2I) accompanied by data compression are presented. Given a query image, the presented retrieval system computes PCA and/or ICA bases by extracting source information. On the data compression in the sense of rate-distortion, this method outperforms JPEG which uses DCT. Besides, PCA and ICA bases reflect each imagepsilas ed...

A method to combine a Bayesian Network (BN) and Hidden Markov Models (HMMs) is presented. This compound system is applied to robot operations. The addressed problem and presented methods are novel with the following features: (1) BN and HMMs make a total decision system by accepting evidences from HMMs to the BN. (2) The HMM-embedded BN is applied...

New methods for joint compression and Image-to-image retrieval (12I retrieval) are presented. The novelty exists in the usage of computationally learned image bases besides color distributions. The bases are obtained by the Principal Component Analysis and/or the Independent Component Analysis. On the image compression, PCA and ICA outperform the J...

A retrieval-aware image format (rim format) is developed for the usage in the similar-image retrieval. The format is based
on PCA and ICA which can compress source images with an equivalent or often better rate-distortion than JPEG. Besides the
data compression, the learned PCA/ICA bases are utilized in the similar-image retrieval since they reflec...

Network communication architecture for cooperative humanoids are designed and realized. The humanoids are operated to walk by two legs. The network is a LAN whose nodes are one master PC and other controller PCs. The master PC works as a central control machine which contains a blackboard, an image processor, and mediators for the controller PCs. T...

Similar-image retrieval systems are newly presented and examined. The systems use ICA bases (independent component analysis bases) or PCA bases (principal component analysis bases). These bases can contain source image's information, however, the indeterminacy of ordering and amplitude on the bases exists due to the PCA and ICA problem formulation...

Similar-image retrieval systems are presented and evaluated. The new systems directly use image bases via ICA (independent component analysis) and PCA (principal component analysis). These bases can extract source image's information which is viable to define similarity measures. But, the indeterminacy on amplitude and permutation exists. In this p...

A new method for E. coli DNA segment classification on promoters and non-promoters is presented. The algorithm is based on the independent component analysis (ICA). Since the DNA segments are composed of discrete symbols, this paper contains two major steps: (1) position-dependent transformation of DNA segments to real number sequences, and (2) app...

Independent component analysis (ICA) is applied to image coding. There, new design methods for ICA bases are presented. The new feature of this learning algorithm includes the weak guidance, or decreasing supervisory information. The weak guidance reduces the permutation indeterminacy which is unavoidable in usual ICA algorithms. In view of the ima...

A network environment that unifies the human movement, animation and humanoid is generated. Since the degrees of freedom are
different among these entities, raw human movements are recognized and labeled using the hidden Markov model. This is a class
of gesture recognition which extracts necessary information transmitted to the animation software a...

A new likelihood maximization algorithm called the α-EM algorithm (α-expectation-maximization algorithm) is presented. This algorithm outperforms the traditional or logarithmic EM algorithm in terms of convergence speed for an appropriate range of the design parameter α. The log-EM algorithm is a special case corresponding to α=-1. The main idea be...

A new likelihood maximization algorithm called the-EM algorithm (-Expectation–Maximization algorithm) is presented. This algorithm outperforms the traditional or logarithmic EM algorithm in terms of convergence speed for an appropriate range of the design parameter. The log-EM algorithm is a special case corresponding to = 1. The main idea behind t...

A new class of learning algorithms for independent component analysis (ICA) is presented. Starting from theoretical discussions
on convex divergence, this information measure is minimized to derive new ICA algorithms. Since the convex divergence includes
logarithmic information measures as special cases, the presented method comprises faster algori...

Iterative optimization of convex diver- gence is discussed. The convex divergence is used as a measure of independence for ICA algorithms. An ad- ditional method to incorporate supervisory informa- tion to reduce the ICA's permutation indeterminacy is also given. Speed of the algorithm is examined us- ing a set of simulated data and brain fMRI data...

This paper gives a method to control or organize itself an activation pattern of fMRI maps obtained by ICA (independent component analysis). The presented method uses an additional term to the convex divergence's gradient. The following merits are observed: (i) Prior knowledge can be effectively used so that obtained activation patterns properly re...

Likelihood optimization methods for learning algorithms are
generalized and faster algorithms are provided. The idea is to transfer
the optimization to a general class of convex divergences between two
probability density functions. The first part explains why such
optimization transfer is significant. The second part contains
derivation of the gen...

The convex divergence is used as a surrogate function for obtaining independence of random variables described by the joint probability density. If the kernel convex function is twice continuously di#erentiable, this case reveals a class of generalized logarithm. This class of logarithms gives generalizations of the score function and the Fisher in...

A class of independent component analysis (ICA) algorithms using a
minimization of the convex divergence, called the f-ICA, is presented.
This algorithm is a super class of the minimum mutual information ICA
and our own α-ICA. The following properties are obtained: 1) the
f-ICA can be implemented by both momentum and turbo methods, and their
combin...

A new class of statistical algorithms is presented and examined. The method is called the α-EM algorithm. This novel algorithm contains the traditional EM algorithm as a special case of α = –1. The choice of the design parameter “ α” affects the eigenvalues of Hessian matrices for likelihood maximization. This causes much faster convergence than th...

The α-logarithm extends the logarithm as the special case of
α=-1. Usage of α-related information measures based upon
this extended logarithm is expected to be effective to speedup of
convergence, i.e., on the improvement of learning aptitude. In this
paper, two typical cases are investigated. One is the α-EM
algorithm (α-expectation-maximization a...

The α-EM algorithm is a super-class of the traditional
expectation-maximization (EM) algorithm. This algorithm is derived by
computing the likelihood ratio of incomplete data through an extended
logarithm; namely, the α-logarithm. The case of α=-1
corresponds to the logarithm. The number α adjusts eigenvalues of
update matrices by reflecting the op...

The α-logarithm is an extension of the logarithm which contains the usual logarithm as the special case of α = −1. Usage of information measures based upon this ex-tended logarithm is expected to be effective to speedup of convergence, i.e., to the improvement of learning ap-titude. In this paper, speedup of the mutual-info-min ICA is investigated....

The α-EM algorithm is a proper extension of the traditional
log-EM algorithm. This new algorithm is based on the α-logarithm,
while the traditional one uses the logarithm. The case of α=-1
corresponds to the log-EM algorithm. Since the speed of the α-EM
algorithm was reported for learning problems, this paper shows that
closed-form E-steps can be o...

Multiple descent cost competitive learning is applied to data-compressed texture generation for 3D image processing and graphics. This learning method organizes itself by generating two types of feature maps: the grouping feature map and the weight vector feature map, which can both change regional shapes. This merit makes it possible for users to...

The α-EM (expectation maximization) algorithm is a super-class of the traditional log-EM algorithm. The case of α=-1 corresponds to the. log-EM algorithm. For the stable region of α>-1, the α-EM algorithm outperforms the traditional method in terms of the learning speed measured by iterations and CPU time. Both the α-EM algorithm and the log-EM alg...

Starting from Renyi's α-divergence, a class of generalized
EM algorithms called the α-EM algorithms of the WEM algorithms are
derived. Merits of this generalization are found on speedup of learning,
i.e., acceleration of convergence. Discussions include novel
α-versions of logarithm, efficient scores, information matrices
and the Cramer-Rao bound....

A class of extended logarithms is used to derive α-weighted
EM (α-weighted expectation-maximization) algorithms. These
extended EM algorithms (WEMs, α-EMs) have been anticipated to
outperform the traditional (logarithmic) EM algorithm on speed. The
traditional approach falls into a special case of the new WEM. In this
paper, general theoretical dis...

A new class of Expectation and Maximization algorithm is presented and applied to probabilistic learning. This algorithm can be derived from the non-negativity of the α-divergence and Bayesian computation. The design parameter α specifies a prior probability weight for the learning. Accordingly, this algorithm is called the α-Weighted EM algorithm...

Multiple descent cost competition is a composition of learning
phases for minimizing a given measure of total performance, i.e., cost.
In the first phase of descent cost learning, elements of source data are
grouped. Simultaneously, a weight vector for minimal learning, (a
winner), is found. Then, the winner and its partners are updated for
further...

The expectation and maximization algorithm (EM algorithm) is
generalized so that the learning proceeds according to adjustable
weights in terms of probability measures. The method presented, the
weighted EM algorithm (or the α-EM algorithm), includes the
existing EM algorithm, as a special case. It is further found that this
learning structure can...

The -divergence is utilized to derive a generalized expectation and maximization algorithm (EM algorithm). This algorithm has a wide range of applications. In this paper, neural network learning for mixture probabilities is focused. The -EM algorithm includes the existing EM algorithm as a special case since that corresponds to = –1. The parameter...

In binocular stereo vision, since color and distance information can be obtained simultaneously by a passive method which will not affect the environment, it is expected to be commonly used in the vision of future mobile robots. However, it is necessary to solve difficult problems such as finding corresponding points in the right and left images. M...

Harmonic competition is a learning strategy based upon
winner-take-all or winner-take-quota with respect to a composite of
heterogeneous subcosts. This learning is unsupervised and organizes
itself. The subcosts may conflict with each other. Thus, the total
learning system realizes a self-organizing multiple criteria
optimization. The subcosts are...

Learning algorithms guided by costs with a variety of penalties are discussed. Both unsupervised and supervised cases are addressed. The penalties are added and/or multiplied to the basic error measure. Since these extra penalties include combination parameters with respect to the basic error, the total problem belongs to a class of multiple object...

Steps from DC (data compression) to AC (animation coding) are
discussed. This means that a digital movie is generated from a single
still image using data compression. Such processing is made possible by
the multiply optimized competitive learning (multiply descent cost
competitive learning). A key point is the usage of the optimized feature
map. T...

An integration of neural and ordinary computations toward multimedia processing is presented. The handled media is a combination of still images and animations. The neurocomputation here is the multiply descent cost competitive learning. This algorithm generates two types of feature maps. One of them: an optimized grouping pattern of pixels by self...

Learning and self-organization via competition and cooperation among parallel computational agents (artificial neurons) are presented. This algorithm is successfully applied to finding good approximate solutions for various Euclidean traveling salesperson problems. The position of each city is one by one fed into the learning mechanism. The solutio...

Modeling and approximation of functions by penalized competitive learning networks are described. The learning is based on winner-take-all or winner-take-quota. Cost functions are combinations of terms representing the data fitness and the qualification on the approximation. The sub-cost to confine the approximation is called competition handicap,...

By using competitive learning, which causes just one or a group of a small number of neurons to respond to a given input, self-organization of entire neural networks can be achieved. When this self-organization process is applied to various kinds of travelling salesman problems in a Euclidean space, a good approximation or the true solution is obta...

Competitive learning in neural networks involving cooperation and
categorization is discussed. Extended vehicle routing problems in the
Euclidean space are also discussed. A fixed number of vehicles with a
shared depot make subtours around precategorized cities and collect
demands. The minimal tour length and even loaded demands are conflicting
req...

Summary form only given. It was shown that the feature map obtained by multiple descent cost competitive self-organization can be used for the transformation of images combined with the supervision of an outside intelligence. The example of the change of emotional expression of a face was considered. The task of the first module was to locate the p...

Multiple descent cost competitive learning simultaneously generates two types of feature maps by self-organization. One is a grouped pattern of atomic data elements; the other is a geometric structure on the set of neural weight vectors. In the case of images, the grouped pattern is a set of nonoverlapping quadrilaterals. Each quadrilateral is asso...

Generalized competitive learning algorithms are described. These algorithms comprise competition handicaps, cooperation and multiply descent cost property. Applications are made on single processing and combinatorial optimizations. Parallel computation of the algorithms presented is discussed

Novel general algorithms for multiple-descent cost-competitive
learning are presented. These algorithms self-organize neural networks
and possess the following features: optimal grouping of applied vector
inputs, product form of neurons, neural topologies, excitatory and
inhibitory connections, fair competitive bias, oblivion,
winner-take-quota rul...

The author discusses algorithms of competitive self-organization
and their application to a typical combinatorial problem, the traveling
salesman problem. The main feature of the proposed algorithm is the
sophisticated use of excitatory/inhibitory intralayer connections of
neurons combined with a judicious selection of neural network topology.
Such...

Summary form only given. The contribution of this study is fourfold: (i) The author's own variable region vector quantization, which is a neurocomputation paradigm of the nearest neighbor type, is presented. (ii) By considering this neurocomputation paradigm and others, the total system is implemented as an emulator. The system is made up of a hype...

Generalized algorithms for vector quantization are presented and their convergences are proved. The generalized vector quantization is called variable region vector quantization since the method allows adjusted variable dimensional vectors covering variable subregions of source data.
Algorithm I is the generalization of the LBG algorithm into the v...

Vector quantization with optimized grouping of elements is studied. The presented vector quantization allows optimal or suboptimal grouping of source data. Thus, the algorithms herein are called variable region vector quantization. The optimization yielding the data subgroups can also be interpreted as the connection weight adjustmen. The presented...

Algorithms for vector quantization of variable region data are given. The design iteration is proved to converge. An important issue here is the optimization step of the region shape with respect to the vector quantization codebook. Thus, the presented design method is a nontrivial extention of ordinary vector quantizer design which contains the cl...

Joint time-spectral vector quantization to achieve much lower bit rates on the average than ordinary spectral vector quantizers is discussed. The principle and used quantities are closed within the theory of LPC so that theoretical clearness and the circumvention of unnecessary system complexity are promoted. The presented theory and method lead to...

Two speech compression systems based on codebooks of inverse filters produced by off-line linear predictive coding (LPC) and vector quantization (VQ) techniques are considered. The first system is a pitch excited vocoder that is a variation on a speech coding system based upon vector quantization. The encoder selects an LPC reverse filter from a fi...

The paper gives a unified treatment of the process distortion measures in information theory and signal processing. Distortion measures discussed in this paper are rho -distance, Prokhorov distance and spectral distortion measure, and their basic properties are presented.

Linear least-square predictor mismatch problems are discussed as applications of stochastic process distortion measures. Linear predictor coding is also discussed in conjunction with distortion measures. First, nondeterministic and purely nondeterministic processes are defined and purely nondeterministic processes are expressed by autoregressive mo...

This paper proposes two universal encoders of voice signals as the composite information source. One is a variant of the linear predictive encoder called either the model matching encoder or the rate distortion encoder, for the operating principle of which mathematical proof will be provided. The other is a waveform encoder with a parallel tree sea...

Two types of speech compression systems which use a set of precalculated inverse filters (the inverse filter codebook) are studied. The information theoretic ideas of universal coding and block quantization motivated the use of the inverse filter code-book. The first system finds the indices of the best matching inverse filter, the gain and the pit...