Yasuaki Ito

Yasuaki Ito
  • DE
  • Professor (Associate) at Hiroshima University

About

178
Publications
43,027
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,693
Citations
Current institution
Hiroshima University
Current position
  • Professor (Associate)
Additional affiliations
October 2004 - present
Hiroshima University
Position
  • Professor (Associate)
October 2004 - present
Hiroshima University
Position
  • Professor (Associate)

Publications

Publications (178)
Article
Full-text available
Quantum chemistry offers the formal machinery to derive molecular and physical properties arising from (sub)atomic interactions. However, as molecules of practical interest are largely polyatomic, contemporary approximation schemes such as the Hartree–Fock scheme are computationally expensive due to the large number of electron repulsion integrals...
Article
An Ising model is a mathematical model defined by an objective function comprising a quadratic formula of multiple spin variables, each taking values of either or . The task of determining a spin value assignment to these variables that minimizes the resulting value of an Ising model is a challenging optimization problem. Recently, quantum annealer...
Article
The Boys function, a mathematical integral function, plays a pivotal role and is frequently evaluated in ab initio molecular orbital computations. The main contribution of this paper is to accelerate the bulk evaluation of the Boys function through the effective utilization of GPUs. The proposed GPU implementation addresses GPU‐specific programming...
Preprint
Full-text available
An Ising model is defined by a quadratic objective function known as the Hamiltonian, composed of spin variables that can take values of either $-1$ or $+1$. The goal is to assign spin values to these variables in a way that minimizes the value of the Hamiltonian. Ising models are instrumental in tackling many combinatorial optimization problems, l...
Article
Quadratic unconstrained binary optimization (QUBO) is a combinatorial optimization to find an optimal binary solution vector that minimizes the energy value defined by a quadratic formula of binary variables in the vector. The main contribution of this article is to propose the bit duplication technique that can specify the number of duplicated bit...
Article
Full-text available
The Ising model is defined by an objective function using a quadratic formula of qubit variables. The problem of an Ising model aims to determine the qubit values of the variables that minimize the objective function, and many optimization problems can be reduced to this problem. In this paper, we focus on optimization problems related to permutati...
Preprint
Full-text available
The Ising model is defined by an objective function using a quadratic formula of qubit variables. The problem of an Ising model aims to determine the qubit values of the variables that minimize the objective function, and many optimization problems can be reduced to this problem. In this paper, we focus on optimization problems related to permutati...
Article
Stroke‐based rendering is a rendering method that mimics the actual painting technique by drawing a stroke by stroke on a blank canvas image. In this paper, we propose a watercolor image generation method using stroke‐based rendering. The proposed method generates an image that is a good approximation of the input image as well as having the charac...
Article
Deflate coding is a very popular lossless data compression method used in zlib, gzip (GNU zip), and zip, which performs the LZSS compression algorithm with Huffman coding. Deflate encoding and decoding involve sequential operations and their parallel acceleration using a GPU is quite hard. The main purpose of this paper is to present GPU implementa...
Chapter
In recent years, scholarly databases have made many scientific papers available on the Internet. While these databases facilitate access to excellent papers, they also increase the possibility of encountering inferior papers. However, it is difficult to predict the quality of a paper just from a glance at the paper. In this paper, we propose a mach...
Article
A wide range of combinatorial optimization problems can be reduced to the Ising model, and equivalently the quadratic unconstrained binary optimization (QUBO) problem. Thus, in recent years, researchers have proposed to solve QUBO on FPGAs, GPUs, and special-purpose processors. The adaptive bulk search (ABS) is a previously-proposed framework for s...
Preprint
Full-text available
Quadratic Unconstrained Binary Optimization (QUBO) is a combinatorial optimization to find an optimal binary solution vector that minimizes the energy value defined by a quadratic formula of binary variables in the vector. As many NP-hard problems can be reduced to QUBO problems, considerable research has gone into developing QUBO solvers running o...
Chapter
Machine learning technology has made it possible to solve a variety of previously unfeasible problems. Accordingly, the size of network models has been increasing. Thus, research on model compression by network pruning has been conducted. Network pruning is usually performed on already-trained network models. However, it is often difficult to remov...
Article
An independent set of a graph is a subset of the nodes such that no two nodes in it are adjacent. The maximum independent set (MIS) problem is an optimization problem to find a largest independent set. The main contribution of this article is to introduce a generic iterative trial search algorithm that we call iMIS for finding approximate solutions...
Article
Quadratic unconstrained binary optimization (QUBO) is a combinatorial optimization problem. Since various NP-hard problems such as the traveling salesman problem can be formulated as a QUBO instance, QUBO is used with a wide range of applications. The main contribution of this article is to propose high-throughput FPGA implementations for the QUBO...
Article
Convolutional Neural Networks (CNNs) are one of the factors supporting the rapid development of artificial intelligent techniques. However, as the ability of the network increases, the size of the network becomes larger. Thus far, several works related to reduction of the network size have been tackled. In many cases, these approaches produce an un...
Article
The volume of digital information is growing at an extremely fast pace which, in turn, exacerbates the need of efficient mechanisms to find the presence of a pattern in an input text or a set of input strings. Combining the processing power of Graphics Processing Unit (GPU) with matching algorithms seems a natural alternative to speedup the string-...
Article
The Floyd‐Warshall algorithm is a well‐known algorithm to compute the distance of all pairs of nodes of a graph. The Blocked Floyd‐Warshall algorithm, a variant of the Floyd‐Warshall has been proposed to accelerate the Floyd‐Warshall algorithm by means of a graphics processing unit (GPU) architecture. The previously published GPU implementations fo...
Article
The main contribution of this paper is to show efficient implementations of the convolution-pooling in the GPU, in which the pooling follows the multiple convolution. Since the multiple convolution and the pooling operations are performed alternately in earlier stages of many Convolutional Neural Networks (CNNs), it is very important to accelerate...
Chapter
The main purpose of this paper is to present a very efficient GPU implementation to compute the trmv, the product of a triangular matrix and a vector. Usually, developers use cuBLAS, a linear algebra library optimized for each of various generations of GPUs, to compute the trmv. To attain better performance than cuBLAS, our GPU implementation of th...
Chapter
The main contribution of this paper is to show efficient implementations of the convolution-pooling in the GPU, in which the pooling follows the multiple convolution. Since the multiple convolution and the pooling operations are performed alternately in earlier stages of many Convolutional Neural Networks (CNNs), it is very important to accelerate...
Chapter
The main contribution of this work is to propose a stained glass image generation based on the Voronoi diagram. In this work, we use the Voronoi cells and edges of the Voronoi diagram as colored glasses and leads in the stained glass, respectively. To fit Voronoi cells to the original image, we use a local search technique. Using this technique, we...
Article
The bulk execution of a sequential algorithm is to execute it for many different inputs in turn or at the same time. It is known that the bulk execution of an oblivious sequential algorithm can be implemented to run efficiently on a GPU. The bulk execution supports fine grained bitwise parallelism, allowing it to achieve high acceleration over a st...
Article
Tile art image generation is one of the non‐photorealistic rendering methods. The generated digital image resembles artistic representation given digital photos and illustrations. The first contribution of this paper is to propose a tile image generation based on the greedy approach. The greedy approach is based on the characteristic of the human v...
Article
This paper presents efficient FPGA implementations for the Bloom filter, in which a large set P of L‐byte patterns are registered beforehand. Our Bloom filter circuit performs the byte stream pattern test such that it receives an input byte stream t and outputs the bit stream in every clock cycle. Each bit of the output bit stream is 1 if an L‐byte...
Article
The bulk execution is to execute some computation for many different inputs in turn or at the same time. The main contribution of this paper is to propose a parallel processing technique for the bulk execution of the dynamic programming using the GPU (Graphics Processing Unit). Especially, we focus on the optimal polygon triangulation problem for a...
Article
Full-text available
Row-wise and column-wise prefix-sum computation of a matrix has many applications in the area of image processing such as computation of the summed area table and the Euclidean distance map. It is known that the prefix-sums of a one-dimensional array can be computed efficiently on the GPU. Hence, row-wise prefix-sums of a matrix can also be compute...
Chapter
The row-wise and column-wise prefix-sum computation of a matrix has many applications in the area of image processing such as computation of the summed area table and the Euclidean distance map. It is known that the prefix-sums of a 1-dimensional array can be computed efficiently on the GPU. Hence, the row-wise prefix-sums of a matrix can also be c...
Chapter
The optimal polygon triangulation problem for a convex polygon is an optimization problem to find a triangulation with minimum total weight. It is known that this problem can be solved using the dynamic programming technique in \(O(n^3)\) time. The main contribution of this paper is to present an efficient parallel implementation of this \(O(n^3)\)...
Article
The main contribution of this paper is to present an efficient GPU implementation of bulk computation of the CKY parsing for a context-free grammar, which determines if a context-free grammar derives each of a lot of input strings. The bulk computation is to execute the same algorithm for a lot of inputs in turn or at the same time. The CKY parsing...
Article
The main contribution of this paper is to present an efficient GPU implementation of bulk computation of eigenvalues for many small, non-symmetric, real matrices. This work is motivated by the necessity of such bulk computation in designing of control systems, which requires to compute the eigenvalues of hundreds of thousands non-symmetric real mat...
Article
Full-text available
There is no doubt that data compression is very important in computer engineering. However, most lossless data compression and decompression algorithms are very hard to parallelize, because they use dictionaries updated sequentially. The main contribution of this paper is to present a new lossless data compression method that we call adaptive loss-...
Article
Full-text available
The main contribution of this paper is to present an implementation that performs the exhaustive search to verify the Collatz conjecture using a GPU. Consider the following operation on an arbitrary positive number: if the number is even, divide it by two, and if the number is odd, triple it and add one. The Collatz conjecture asserts that, startin...
Conference Paper
The main contribution of this paper is to present a new hardware architecture for accelerating LZW compression using an FPGA. In the proposed architecture, we efficiently use dual-port block RAMs embedded in the FPGA to implement a hash table that is used as a dictionary. Using independent two ports of the block RAM, reading and writing operations...
Conference Paper
There is no doubt that data compression is very important in computer engineering. However, most lossless data compression and decompression algorithms are very hard to parallelize, because they use dictionaries updated sequentially. The main contribution of this paper is to present a new lossless data compression method that we call Light Loss-Les...
Article
Full-text available
The main contribution of this paper is to present a work-optimal parallel algorithm for LZW decompression and to implement it in a CUDA-enabled GPU. Since sequential LZW decompression creates a dictionary table by reading codes in a compressed file one by one, it is not easy to parallelize it. We first present a work-optimal parallel LZW decompress...
Article
Algorithms requiring fast manipulation of multiple-length numbers are usually implemented in hardware. However, hardware implementation, using HDL (Hardware Description Language) for instance, is a laborious task and the quality of the solution relies heavily on the designer expertise. The main contribution of this work is to present a flexible-len...
Article
In this paper, we present a GPU implementation of bulk multiple-length multiplications. The idea of our GPU implementation is to adopt a warp-synchronous programming technique. We assign each multiple-length multiplication to one warp that consists of 32 threads. In parallel processing using multiple threads, usually, it is costly to synchronize ex...
Article
The closeness of a match is an important measure with a number of practical applications, including computational biology, signal processing and text retrieval. The approximate string matching (ASM) problem asks to find a substring of string Y of length n that is most similar to string X of length m. It is well-know that the ASM can be solved by dy...
Article
Several important tasks, including matrix computation, signal processing, sorting, dynamic programming, encryption, and decryption, can be performed by oblivious sequential algorithms. A sequential algorithm is oblivious if an address accessed at each time does not depend on the input data. A bulk execution of a sequential algorithm is to execute i...
Article
Conway's Game of Life is the most well-known cellular automaton. The universe of the Game of Life is a 2-dimensional array of cells, each of which takes two possible states, alive or dead. The state of every cell is repeatedly updated according to those of eight neighbors. A cell will be alive if exactly three neighbors are alive, or if it is alive...
Article
Conway's Game of Life is the most well-known cellular automaton. The universe of the Came of Life is a 2-dimensional array of cells, each of which takes two possible states, alive or dead. The state of every cell is repeatedly updated according to those of eight neighbors. A. cell will be alive if exactly three neighbors are alive, or if it is aliv...
Conference Paper
Vertex coloring is an assignment of colors to vertex of an undirected graph such that no two vertices sharing the same edge have the same color. The vertex coloring problem is to find the minimum number of colors necessary to color a graph given, which is an NP-hard problem in combinatorial optimization. Ant Colony Optimization (ACO) is a well-know...
Conference Paper
Suppose that a sequence of sensing data with timestamps are transferred asynchronously. Some of sensing data may be delayed by some period of time and the sequence is not in proper increasing order of timestamps. A sequence of timestamps to, t1,..., tn-l is d-sorted if ti
Conference Paper
The task of finding strings having a partial match to a given pattern is of interest to a number of practical applications, including DNA sequencing and text searching. Owing to its importance, alternatives to accelerate the Approximate String Matching (ASM) have been widely investigated in the literature. The main contribution of this work is to p...
Conference Paper
The main contribution of this paper is to present a very efficient GPU implementation of bulk computation of eigenvalues for a large number of small non-symmetric real matrices. This work is motivated by the necessity of such bulk computation in design of control systems, which requires to compute the eigenvalues of hundreds of thousands non-symmet...
Article
Digital halftoning is an important process to convert a grayscale image into a binary image with black and white pixels. Local exhaustive search-based halftoning is one of the halftoning methods that can generate high-quality binary images. However, considering the computing time, it is not realistic for most applications. As a first contribution,...
Article
Full-text available
The bulk execution of a sequential algorithm is to execute it for many different inputs in turn or at the same time. A sequential algorithm is oblivious if the address accessed at each time unit is independent of the input. It is known that the bulk execution of an oblivious sequential algorithm can be implemented to run on a GPU very efficiently....
Article
The FDFM (Few DSP slices and Few block Memories) approach is an efficient approach which implements a processor core executing a particular algorithm using few DSP slices and few block RAMs in a single FPGA. Since a processor core based on the FDFM approach uses few hardware resources, hundreds of processor cores working in parallel can be implemen...
Conference Paper
Full-text available
The main contribution of this paper is to present an intermediate approach of software and hardware using FPGAs. More specifically, we present a processor based on FDFM (Few DSP slices and Few Memory blocks) approach that supports arithmetic operations with flexibly many bits, and implement it in the Xilinx Virtex-6 FPGA. Arithmetic instructions of...
Conference Paper
Full-text available
If we process large-integers on the computer, they are represented by multiple-length integer. Multiple-length multiplication is widely used in areas such as scientific computation and cryptography processing. However, the computation cost is very high since CPU does not support a multiple-length integer. In this paper, we present a GPU implementat...
Conference Paper
Full-text available
The Conway's Game of Life is the most well-known cellular automaton. The universe of the Game of Life is a 2-dimensional array of cells, each of which takes two possible states, alive or dead. The state of every cell is repeatedly updated according to those of eight neighbors. A cell will be alive if exactly three neighbors are alive, or if it is a...
Conference Paper
Full-text available
The LZW compression is a well known patented lossless compression method used in Unix file compression utility "compress" and in GIF and TIFF image formats. It converts an input string of characters (or 8-bit unsigned integers) into a string of codes using a code table (or dictionary) that maps strings into codes. Since the code table is generated...

Network

Cited By