Conference PaperPDF Available

ASCII art generation using the local exhaustive search on the GPU

Authors:

Abstract and Figures

An ASCII art is a matrix of characters that reproduces an original gray-scale image. It is commonly used to represent pseudo gray-scale images in text based messages. Since automatic generation of high quality ASCII art images is very hard, they are usually produced by hand. The main contribution of this paper is to propose a new technique to generate an ASCII art that reproduces the original tone and the details of an input gray-scale image. Our new technique is inspired by the local exhaustive search to optimize binary images for printing based on the characteristic of the human visual system. Although it can generate high quality ASCII art images, a lot of computing time is necessary for the local exhaustive search. Hence, we have implemented our new technique in a GPU to accelerate the computation. The experimental results shows that the GPU implementation can achieve a speedup factor up to 57.1 over the conventional CPU implementation.
Content may be subject to copyright.
ASCII Art Generation using the Local Exhaustive
Search on the GPU
Yuji Takeuchi, Daisuke Takafuji, Yasuaki Ito, Koji Nakano
Department of Information Engineering
Hiroshima University
Kagamiyama 1-4-1, Higashi-Hiroshima, 739-8527, JAPAN
Abstract—An ASCII art is a matrix of characters that re-
produces an original gray-scale image. It is commonly used to
represent pseudo gray-scale images in text based messages. Since
automatic generation of high quality ASCII art images is very
hard, they are usually produced by hand. The main contribution
of this paper is to propose a new technique to generate an ASCII
art that reproduces the original tone and the details of an input
gray-scale image. Our new technique is inspired by the local
exhaustive search to optimize binary images for printing based
on the characteristic of the human visual system. Although it
can generate high quality ASCII art images, a lot of computing
time is necessary for the local exhaustive search. Hence, we
have implemented our new technique in a GPU to accelerate
the computation. The experimental results shows that the GPU
implementation can achieve a speedup factor up to 57.1 over the
conventional CPU implementation.
Index Terms—ASCII art, local exhaustive search, human visual
system, GPU, parallel computing
I. INTRODUCTION
An ASCII art is a matrix of characters reproducing an
original image. ASCII arts are commonly used to show pseudo
gray-scale images on devices or environment that can only
display characters. ASCII arts have a long history, and exist
before the computers have been developed. One of the most
famous examples of ASCII arts represents the tail of a rat,
published in “Alice’s Adventures in Wonderland” [1]. As In-
ternet becomes popular, ASCII arts have been used in various
situations, such as the contents of e-mails and bulletin boards
on the Web. The main purpose of treating ASCII arts is to
print easier, or to communicate as alternative of graphics in the
situations which the communication of graphics is impossible.
ASCII arts can be roughly classified into two major cate-
gories: the tone-based ASCII art and the structure-based ASCII
art [2]. In the tone-based ASCII art, an original gray-scale
image is converted into a matrix of characters so that the
intensity level is reproduced (Fig. 1). Usually, the original
gray-scale image is partitioned into blocks of a character size,
and a character is assigned to each block such that the intensity
level is preserved. On the other hand, the structure-based
ASCII art is generated by converting an original gray-scale
image into a matrix of characters so that the shapes of the
original image is reproduced (Fig. 2). A character is assigned
to each block such that the shape of the block is preserved.
The main contribution of this paper is to propose a new
method for generating an ASCII art image which can maintain
An original gray-scale image The tone-based ASCII art
Fig. 1. The tone-based ASCII art
The structure-based ASCII artAn original image
Fig. 2. The structure-based ASCII art (from [2])
the smooth changes of intensity levels and the shapes in an
original gray-scale image. The resulting ASCII art by our
method is essentially the tone-based ASCII art, but it also has
a flavor of the structure-based ASCII art. Our new approach is
inspired by digital halftoning [3], [4] of gray-scale images into
binary images for printing. In particular, it uses a technique
of the local exhaustive search [5], [6] for digital halftoning,
which can generate a binary image that preserves the details
and the intensity levels of an original input gray-scale image.
It is known that the direct binary search [7] can generate high
quality binary images that reproduces the details and the tones
of original gray-scale images. Later, the direct binary search
is extended to the local exhaustive search [5], [6], which can
generate better binary images. Our new method for ASCII art
generation uses the local exhaustive search, and can reproduces
the details and the tones of original gray-scale images.
In a conventional method for generating a tone-based ASCII
art, a character is selected for each block of an original image
such that the average intensity level is preserved. In other
words, a character with the most similar intensity level of
the corresponding block in an original image is selected. For
example, a free software “Text artist” [8] uses this approach.
Though this method is very simple and can be implemented
easily, the details and the intensitylevel of an original image is
not reproduced well. In [9], intensity level of an original image
is reproduced by adjusting the space of characters. However,
the details of the original image are not reproduced. In [10],
ASCII art generation for original binary images was shown.
This method works well for binary images, but cannot handle
gray-scale images.
Our new approach first initializes a matrix of characters
by the conventional tone-based ASCII art generation. After
that, characters are repeatedly replaced by the best character
among all available characters. To select the best character, a
matrix of characters is blurred using the Gaussian filter and
the pixel-wise difference of the blurred image and the original
image is computed as an error. The best character is selected so
that the total error is minimized. This replacement is repeated
until no more improvement is possible. The resulting matrix
of characters reproduces the original gray-scale image very
well, because the error of the blurred matrix of characters and
the original gray-scale image is small and the Gaussian filter
approximates the human visual system. However, compared
with a known approach, our approach requires enormous
amount of computation to search the best character image
among all characters.
The GPU (Graphics Processing Unit), is a specialized circuit
designed to accelerate computation for building and manipu-
lating images [11], [12], [13], [14]. Latest GPUs are designed
for general purpose computing and can perform computation
in applications traditionally handled by the CPU. Hence,
GPUs have recently attracted the attention of many application
developers [11], [15]. NVIDIA provides a parallel computing
architecture called CUDA (Compute Unified Device Architec-
ture) [16], the computing engine for NVIDIA GPUs. CUDA
gives developers access to the virtual instruction set and
memory of the parallel computational elements in NVIDIA
GPUs. In many cases, GPUs are more efficient than multicore
processors [12], since they have hundreds of processor cores
and very high memory bandwidth. To accelerate our new ap-
proach, we have parallelized the replacing process so that the
replacement is performed for multiple blocks in parallel. We
have implemented our method in a CUDA-enabled GPU and
evaluated the performance on NVIDIA GeForce GTX 680. For
ASCII art generation for an original image of size
using 95 ASCII code characters, our GPU implementation
runs in 0.056s, while the Intel CPU implementation runs in
3.108s. Further, if we use 7310 JIS Kanji code characters,
our GPU implementation runs in only 1.123s, while the Intel
CPU implementation runs in 64.17s. If this is the case, the
GPU implementation can achieve a speedup factor up to 57.1
over the conventional CPU implementation.
This paper is organized as follows. Section II explains a
conventional method for generating the tone-based ASCII art.
In Section III, we show outline of our new method based on
the local exhaustive search for the tone-based ASCII art. We
then go on to show an algorithm and an implementation of our
method for generating the tone-based ASCII art using the local
exhaustive search in Section IV. In Section V, we show how
we have implemented our method in the GPU to accelerate
the computation. Section VI compares the resulting ASCII art
images of the convention method and our method, and shows
the computing time. Section VII concludes our work.
II. A CONVENTIONAL METHOD FOR THE TONE-BASED
ASCII ART GENERATION
The main purpose of this section is to describe a con-
ventional method for the tone-based ASCII art generation.
The idea is to partition an original image into blocks of the
same size as characters. Each block is assigned a character
such that each character reproduces the intensity level of the
corresponding block.
Fig. 3. An example of the bitmap image of a character
Before showing the conventional algorithm, we review how
each character is displayed as a bitmap image. Figure 3 shows
an example of the bitmap image of a character. The bitmap
image is a binary image with pixels 0 (black) or 1 (white).
The bitmap image of Figure 3 is of size . It has 60
black pixels and 196 white pixels out of 256 pixels. Hence,
we can think that the intensity level of the character is
. Let ( ) denote a pixel value
(0 or 1) at position of character of bitmap size .
We can compute the intensity level of as follows:
Suppose that a gray-scale image of size
is given, where denotes the intensity level at position
taking a real value in the range .
The real value corresponds to the intensity level of each pixel,
and 0 and 1 correspond to black and white, respectively. Let
us partition the gray-scale image into blocks of size
each. Let ( ) denote a block
with pixels (
). It should be clear that the average intensity
of each block is:
(1)
Let be a set of available characters. The conventional algo-
rithm for the tone-based ASCII art image selects a character
for each block such that the intensity level of a character is
closest to the average intensity of the block. Let
be an ASCII art such that each is a character in .We
determine each character so that:
However, the distribution of the intensity levels of a char-
acter set may be biased in the sense that it does not have
characters with intensity levels close to 0 or 1. For example,
a usual character set has no character with 1 white pixel and
black pixels. Thus, the error can be too
large if is close to 0 or 1. To resolve this problem, we
adjust the intensity levels of an original image as
follows. Let and be the highest and the lowest intensity
levels of all characters in . More specifically,
and
We adjust the intensity level of each pixel such that
(2)
Clearly, the intensity level of each pixel takes a value in the
range , and thus, the average intensity level of each
block is also in .
III. OUR ALGORITHM USING THE LOCAL EXHAUSTIVE
SEARCH
The main purpose of this section is to present a new algo-
rithm for generating an ASCII art using the local exhaustive
search.
We use a Gaussian filter that approximates the characteristic
of the human visual system. Let denote a
Gaussian filter, i.e. a 2-dimensional symmetric matrix of size
, where each non-negative real number
( ) is determined by a 2-dimensional Gaussian
distribution such that their sum is 1. In other words,
(3)
where is a parameter of the Gaussian distribution and is
a fixed real number to satisfy .
Suppose that an ASCII art consists of
characters such that each is a character in . We can
construct a binary image of size from as
follows:
(4)
In other words, is the resulting image obtained by rendering
the ASCII art . We can obtain a blurred image
of using the Gaussian filter as follows:
We are now in a position to show our ASCII art generation.
The idea of our ASCII art generation is to find an ASCII art
such that the blurred image is very similar to the original
image . We define the error of with respect to as the
sum of difference of the intensity levels as follows:
(5)
The goal of our method is to find the best ASCII art so
that
is an ASCII art using a character set
Since it is a very hard problem to find the optimal ASCII
art , we use the approximation technique by the local
exhaustive search. The outline of our algorithm that computes
an ASCII art of an original gray-scale image using a
character set is as follows:
[ASCII art generation by the local exhaustive search]
Step 1: Initialization
We generate an ASCII art using the conventional
algorithm for the tone-based ASCII art generation.
Step 2: The local exhaustive search
We pick an element in one by one from
the top-left corner to the bottom-right corner in the
raster scan order. We select a replacement character
of , which minimizes the total error over all
characters in , and replace by such . This
replacement procedure by the raster scan order is
repeated until one round of raster scan order search
from the top-left corner to the bottom-right corner
does not replace characters and the error is not
improved.
Step 3: Output
Compute a bitmap image of the ASCII art and
output it.
The reader should refer to Figure 4 illustrating the raster
scan order local exhaustive search in Step 2. Note that this
algorithm may not find the optimal ASCII art . However,
it can find a good approximation of the optimal ASCII art.
IV. IMPLEMENTATION OF ASCII ART GENERATION USING
THE LOCAL EXHAUSTIVE SEARCH
The main purpose of this section is to show how each step
of our new approach is implemented.
Again, let be the size of characters in . We can
partition all characters in into groups
such that each has characters with white pixels and
black pixels. Clearly, the intensity levels of characters in
A
B
C
D
E
Fig. 4. Step 2: the raster scan order local exhaustive search
()is . We assume that, for each character in
, the blurred image of the bitmap of is computed in
advance. The blurred image has pixels
such that
( ).
In Step 1, we first adjust the intensity level of every pixel
in an original gray-scale image using formula (2).
After that, we compute the average intensity level of
each block using formula (1). For each block ,we
pick a character in at random, where satisfies
(6)
We can generate an ASCII art by choosing the
picked character for as a character of . Also, from
, we can generate a bitmap image by
formula (4).
In Step 2, we first compute the blurred image of
the bitmap image by computing formula (3). We
compute the error matrix such that
Clearly, the total error is the sum of from formula (5). In
Step 2, we need to find a replacement character of that
minimizes the total error. Clearly, it is sufficient to compute
the total error of the affected region that includes the block
as illustrated in Figure 5. The affected region is a region
of the image such that the Gaussian filter for the bitmap
image of affects the pixel values of the blurred image.
More specifically, the affected region of is a set
of positions in the image such that
Since the size of the Gaussian filter is ,
that of the affected region is . To find
a replacement character , we compute in
pixels in the affected region. Note that, after this computation,
we can think that is a character with each pixel having
block
afected region
Fig. 5. The affected region of a block
intensity level 0. After that, we compute the total error for
each character in by evaluating the following formula:
(7)
We evaluate this formula for all characters in , and replace
by with the minimum total error. In other words, we
execute the following operation:
(8)
To accelerate the local exhaustive search, we use two ideas:
(1) replacement map, and (2) partial search. We first explain
the idea of the replacement map. In Step 2, a round of the
raster scan order search is repeated. It is possible that a region
of an ASCII art is fixed in an earlier round, and no character
in the region is not replaced until Step 2 terminates. Hence, it
makes sense to perform the local exhaustive search for which
characters might be replaced. For thepurpose of determining if
characters might be replaced, we use a replacement map
of size . Before a round of the raster scan order
search, all values in is initialized by 0. We set
if the operation in formula (8) replaces character , that is,
the right-hand side offormula (8) is not equal to . Clearly,
at the end of the round, if has been replaced
in this round. Further, the affected region in which a character
might be replaced in next round consists of such that
or its neighbor takes value 1. Figure 6 illustrates an
example of a replacement map and the affected region. In the
next round, it is sufficient to perform the operation in formula
(8) for the affected region.
The second idea, the partial search is used to reduce
the computation of the right-hand side of formula (8). The
intensity level of the right-hand side is close to with
high probability, because it should be rare that the intensity
0000000000
0000000000
0000010000
0000000000
0000110000
0000000000
0010000000
0000000000
0000000111
0000000000
Fig. 6. The replacement map in an affected region
level changes a lot by the local exhaustive search. Thus, it is
not necessary to find the minimum over all characters in .It
is sufficient to evaluate the values of formula (7) for characters
in such that is close to . More specifically,
we perform the following operation:
(9)
where for some appropriate
fixed positive integer , and is an integer such that
Note that and thus includes characters with
the intensity level close to . In our experiments that we
will show later, we set , and so includes characters
with 21 intensity levels close to .
Step 3 just computes a bitmap image by formula
(4) from the ASCII art . This can be done in an
obvious way.
V. GPU IMPLEMENTATION
The main purpose of this section is to show our GPU
implementation of the local exhaustive search for generating
an ASCII art.
We briefly explain CUDA architecture that we will use.
NVIDIA provides a parallel computing architecture called
CUDA on NVIDIA GPUs. CUDA uses two types of memories
in the NVIDIA GPUs: the global memory and the shared
memory [16]. The global memory is implemented as an off-
chip DRAM of the GPU, and has large capacity, say, 1.5-6
Gbytes, but its access latency is very long. The shared memory
is an extremely fast on-chip memory with lower capacity,
say, 16-48 Kbytes. Figure 7 illustrates the CUDA hardware
architecture.
CUDA parallel programming model has a hierarchy of
thread groups called grid,block and thread. A single grid
is organized by multiple blocks, each of which has equal
number of threads. The blocks are allocated to streaming
multiprocessors such that all threads in a block are executed by
the same streaming multiprocessor in parallel. All threads can
access to the global memory. However, threads in a block can
Fig. 7. CUDA hardware architecture
access to the shared memory of the streaming multiprocessor
to which the block is allocated. Since blocks are arranged to
multiple streaming multiprocessors, threads in different blocks
cannot share data in the shared memories.
We are now in a position to explain how we implement three
steps of our ASCII art generation using the local exhaustive
search. We assume that the adjusted image of an original
image is stored in the global memory in advance, and the
implementation writes the resulting ASCII art image in the
global memory. Further, we assume that the bitmap image of
all characters in and the blurred image of every character
are also stored in the global memory.
To implement Step 1, CUDA blocks are invoked one for
each block of an image . Let (
) denote a CUDA block assigned to a block .
Each CUDA block is responsible for computing the
error matrix of the corresponding block using
the shared memory. For this purpose, copies pixel
values in of the affected region in the shared memory.
After that, each CUDA block computes the average
intensity level by computing formula (7), and selects
a character in satisfying formula (6). Finally, the error
matrix of the corresponding block is computed
from the blurred image of and pixel values in of the
affected region . The error matrix of the resulting
block is copied to the global memory.
In Step 2, the local exhaustive search to evaluate formula
(9) is performed in parallel using multiple CUDA blocks.
However, the local exhaustive search for adjacent blocks
cannot be executed in parallel, because the application of the
Gaussian filter to adjacent blocks affects each other. Thus, we
partition blocks into four groups such that
Group 1: even columns and even rows,
Group 2: odd columns and even rows,
Group 3: even columns and odd rows, and
Group 4: odd columns and odd rows.
The reader should refer to Figure 8 illustrating the groups.
We use CUDA blocks, and perform the local exhaustive
search in all blocks of each group. Note that, if then
the Gaussian filter of two blocks in a group never affect each
other, where the bitmap image of a character is and
12
34
12
34
12
34
12
34
12
34
12
34
12
34
12
34
12
34
Fig. 8. Groups of blocks
the size of the Gaussian filter is .In
other words, the affected regions illustrated in Figure 5 of a
particular group do not overlap each other. Actually, in our
experiment, we choose and . Step 2 performs
the local exhaustive search forGroup 1, Group 2, Group 3, and
Group 4, in turn. A CDUA block is invoked for each block of a
group. The CUDA block copies the error matrix corresponding
to the affected region in the global memory to the shared
memory. After that, each CUDA block evaluates the right-
hand side of formula (9) to find the replacement character.
Finally,the error matrix of the corresponding block
is computed and the error matrix of the resulting block is
copied to the global memory in the same way as Step 1.
To implement Step 3, one CUDA block is used to generate
a block of the bitmap image by formula (4) from
the ASCII art . This can be done in an obvious
way.
VI. EXPERIMENTAL RESULTS
We have used Lena gray-scale images in Figure 1 of size
, , and . We use a set of
7310 characters in the JIS Kanji code with pixels
and a set of 95 characters in the ASCII code with
pixels. A Gaussian filter of size with parameter
is used. Figure 9 shows the resulting ASCII art images using
JIS Kanji code characters and ASCII codecharacters. We have
executed the conventional method and our method using the
local exhaustive search.Clearly, the resulting ASCII art images
by our method can reproduce the details and the tones of the
original Lena image, and the quality is much better than those
by the conventional method. In particular, the edges of images
are sharper than those.
We have evaluated the computing time for generating the
ASCII art images. We have used a PC using Intel Xeon
X7460 running in 2.66GHz to evaluate the implementation by
sequential algorithms. We also used NVIDIA GeForce GTX
680 which has 1536 processing cores in 8 SMX units [17].
Table I shows the computing time for generating the ASCII
art images. Our method using the local exhaustive search takes
much more time than the conventional method. However, by
using the GPU, the computing time can be reduced by a factor
of 23.7-57.1. Our method takes 3.108s for the Lena image
of size using the ASCII code. The computing
time can be reduced to 56ms using the GPU. Even if the
JIS Kanji code is used, the computing time is 1.123s by the
GPU acceleration. This computing time is acceptable for most
applications such as amusement purpose.
VII. CONCLUSIONS
The main contribution of this paper is to propose a new
technique to generate an ASCII art image that reproduces the
original tone and the details of input gray-scale images. We
have presented a new technique using the local exhaustive
search to optimize binary images for printing based on the
characteristic of the human visual system. The resulting ASCII
art images by our new method can reproduce the details
and the tones of original gray-scale images. To accelerate
ASCII art generation by our method, we have implemented
it in the GPU. The experimental results show that the GPU
implementation can achieve a speedup factor up to 57.1 over
the conventional CPU implementation.
REFERENCES
[1] L. Carroll, Alices Adventures in Wonderland. Macmillan, 1865.
[2] X. Xu, L. Zhang, and T.-T. Wong, “Structure-based ASCII art,” ACM
Transactions on Graphics (SIGGRAPH 2010 issue), vol. 29, no. 4, pp.
52:1–52:9, July 2010.
[3] D. L. Lau and G. R. Arce, Modern Digital Halftoning. Marcel Dekker,
2001.
[4] D. Knuth, “Digital halftones by dot diffusion,” ACM Trans. Graphics,
vol. 6-4, pp. 245–273, 1987.
[5] Y. Ito and K. Nakano, “FM screening by the local exhaustive search
with hardware acceleration,” International Journal on Foundations of
Computer Science, vol. 16, no. 1, pp. 89–104, Feb. 2005.
[6] ——, “A new FM screening method to generate cluster-dot binary
images using the local exhaustive search with FPGA acceleration,”
International Journal on Foundations of Computer Science, vol. 19,
no. 6, pp. 1373–1386, Dec. 2008.
[7] M. Analoui and J. Allebach, “Model-based halftoning by direct binary
search,” in Proc. SPIE/IS&T Symposium on Electronic Imaging Science
and Technology, vol. 1666, 1992, pp. 96–108.
[8] IROMSOFT. Text artist. [Online]. Available:
http://www.hm.h555.net/irom/
[9] Y. Furuta, J. Mitani, and Y. Fukui, “A method for generating ascii-art
images from a character sequence by adjusting the kerning,” IPSJ, Tech.
Rep., 2010.
[10] P. D. O’Grady and S. T. Rickard, “Automatic ascii art conversion of
binary images using non-negative constraints,” in Proc. of the Irish
Signal and Systems Conference, 2008, pp. 186–191.
[11] W. W. Hwu, GPU Computing Gems Emerald Edition. Morgan
Kaufmann, 2011.
[12] D. Man, K. Uda, H. Ueyama, Y. Ito, and K. Nakano, “Implementations
of a parallel algorithm for computing euclidean distance map in mul-
ticore processors and GPUs,” International Journal of Networking and
Computing, vol. 1, no. 2, pp. 260–276, July 2011.
[13] K. Ogawa, Y. Ito, and K. Nakano, “Efficient canny edge detection
using a gpu,” in Proc. of International Conference on Networking and
Computing, Nov. 2010, pp. 279–280.
[14] A. Uchida, Y. Ito, and K. Nakano, “Fast and accurate template matching
using pixel rearrangement on the GPU,” in Proc. of International
Conference on Networking and Computing, Dec. 2011, pp. 153–159.
[15] ——, “An efficient GPU implementation of ant colony optimization for
the traveling salesman problem,” in Proc. of International Conference
on Networking and Computing, Dec. 2012, pp. 94–102.
[16] NVIDIA Corporation, “NVIDIA CUDA C programming guide version
5.0,” 2012.
[17] ——, “NVIDIA GeForce GTX680 GPU whitepaper,” 2012.
TABLE I
COMPUTING TIME (IN SECONDS)FOR GENERATING ASCII ART IMAGES
JIS Kanji code ASCII code
Image size
Conventional Intel CPU
method NVIDIA GPU
Speed-up 23.7 41.0 51.2 24.9 43.6 54.6
Our method Intel CPU 4.061 16.08 64.17
using the LES NVIDIA GPU 0.1426 0.3312 1.123
Speed-up 32.6 48.5 57.1 44.0 51.1 55.2
(1) Conventional method for JIS Kanji code characters (2) Our method for JIS Kanji code characters
(3) Conventional method for ASCII code characters (4) Our method for ASCII code characters
Fig. 9. The resulting ASCII art images
... An ASCII art is just like a matrix of ASCII codes that will combine to produce an original image. The main aim of ASCII art is to interact as a substitute for graphics in such conditions where graphics communication is impossible [6]. This paper has been organized into the following sections. ...
Article
Full-text available
With the passage of time networking field has become much more advanced. Because of this advancement, the communicating parties don't want to rely on the third party for communication because a third party may misuse or share their personal information with someone else. That's why there is a need for such a method at which we can rely on secure communication. In recent years a lot of cryptographic techniques based on ASCII values have been proposed, but selecting an efficient and effective technique from them is a big task. In this paper, we have made a comparison among several techniques based on certain parameters to find out the best one for the ease of the users.
... Fig. 2a shows the radare2 call graph Fig. 2b shows our depiction of the same graph for comparison. General image to ASCII converters [29,32,45] focus on emulating tone differences across an image and thus are a poor fit for node-link diagrams. Xu et al. [53] present an algorithm for converting vector line art to ASCII, but their method requires several minutes to generate an image, which is unacceptable in our workflow. ...
Preprint
Package managers provide ease of access to applications by removing the time-consuming and sometimes completely prohibitive barrier of successfully building, installing, and maintaining the software for a system. A package dependency contains dependencies between all packages required to build and run the target software. Package management system developers, package maintainers, and users may consult the dependency graph when a simple listing is insufficient for their analyses. However, users working in a remote command line environment must disrupt their workflow to visualize dependency graphs in graphical programs, possibly needing to move files between devices or incur forwarding lag. Such is the case for users of Spack, an open source package management system originally developed to ease the complex builds required by supercomputing environments. To preserve the command line workflow of Spack, we develop an interactive ASCII visualization for its dependency graphs. Through interviews with Spack maintainers, we identify user goals and corresponding visual tasks for dependency graphs. We evaluate the use of our visualization through a command line-centered study, comparing it to the system's two existing approaches. We observe that despite the limitations of the ASCII representation, our visualization is preferred by participants when approached from a command line interface workflow.
... Takeuchi et al. [11] have focused on a method to efficiently generate ASCII art. Because large scale ASCII arts are generally very complex, it is very difficult to automatically generate ASCII art with high quality. ...
Chapter
The main purpose of this paper is to present a very efficient GPU implementation to compute the trmv, the product of a triangular matrix and a vector. Usually, developers use cuBLAS, a linear algebra library optimized for each of various generations of GPUs, to compute the trmv. To attain better performance than cuBLAS, our GPU implementation of the trmv uses various acceleration technique for latest GPUs. More specifically, our GPU implementation has the following features: (1) only one kernel is called; (2) maximum number of threads are invoked; (3) all memory access to the global memory is coalesced; (4) all memory access to the shared memory has no bank conflict; and (5) shared memory access is minimized by a warp shuffle function. Experimental results for five generations of NVIDIA GPUs for matrices of sizes from \(32\times 32\) to \(\mathrm {16K}\times \mathrm {16K}\) for fp32 show that our GPU implementation is faster than cuBLAS and muBLAS for almost all matrix sizes and GPU generations.
Chapter
The main contribution of this paper is to show efficient implementations of the convolution-pooling in the GPU, in which the pooling follows the multiple convolution. Since the multiple convolution and the pooling operations are performed alternately in earlier stages of many Convolutional Neural Networks (CNNs), it is very important to accelerate the convolution-pooling. Our new GPU implementation uses two techniques, (1) convolution interchange with direct sum, and (2) conversion to matrix multiplication. By these techniques, the computational and memory access cost are reduced. Further the convolution interchange is converted to matrix multiplication, which can be computed by cuBLAS very efficiently. Experimental results using Telsa V100 GPU show that our new GPU implementation compatible with cuDNN for the convolution-pooling is at least 1.34 times faster than the multiple convolution and then the pooling by cuDNN, the most popular library of primitives to implement the CNNs in the GPU.
Article
Package managers provide ease of access to applications by removing the time-consuming and sometimes completely prohibitive barrier of successfully building, installing, and maintaining the software for a system. A package dependency contains dependencies between all packages required to build and run the target software. Package management system developers, package maintainers, and users may consult the dependency graph when a simple listing is insufficient for their analyses. However, users working in a remote command line environment must disrupt their workflow to visualize dependency graphs in graphical programs, possibly needing to move files between devices or incur forwarding lag. Such is the case for users of Spack, an open source package management system originally developed to ease the complex builds required by supercomputing environments. To preserve the command line workflow of Spack, we develop an interactive ASCII visualization for its dependency graphs. Through interviews with Spack maintainers, we identify user goals and corresponding visual tasks for dependency graphs. We evaluate the use of our visualization through a command line-centered study, comparing it to the system's two existing approaches. We observe that despite the limitations of the ASCII representation, our visualization is preferred by participants when approached from a command line interface workflow.
Conference Paper
Full-text available
Graphics Processing Units (GPUs) are specialized microprocessors that accelerate graphics operations. Recent GPUs, which have many processing units connected with an off-chip global memory, can be used for general purpose parallel computation. Ant Colony Optimization (ACO) approaches have been introduced as ature-inspired heuristics to find good solutions of the Traveling Salesman Problem (TSP). In ACO approaches, a number of ants traverse the cities of the TSP to find better solutions of the TSP. The ants randomly select next visiting cities based on the probabilities determined by total amounts of their pheromone spread on routes. The main contribution of this paper is to present sophisticated and efficient implementation of one of the ACO approaches on the GPU. In our implementation, we have considered many programming issues of the GPU architecture including coalesced access of global memory, shared memory bank conflicts, etc. In particular, we present a very efficient method for random selection of next cities by a number of ants. Our new method uses iterative random trial which can find next cities in few computational costs with high probability. The experimental results on NVIDIA GeForce GTX 580 show that our implementation for 1002 cities runs in 8.71 seconds, while a conventional CPU implementation runs in 381.95 seconds. Thus, our GPU implementation attains a speed-up factor of 43.47.
Article
The wide availability and popularity of text-based communication channels encourage the usage of ASCII art in representing images. Existing tone-based ASCII art generation methods lead to halftone-like results and require high text resolution for display, as higher text resolution offers more tone variety. This paper presents a novel method to generate structure-based ASCII art that is currently mostly created by hand. It approximates the major line structure of the reference image content with the shape of characters. Representing the unlimited image content with the extremely limited shapes and restrictive placement of characters makes this problem challenging. Most existing shape similarity metrics either fail to address the misalignment in real-world scenarios, or are unable to account for the differences in position, orientation and scaling. Our key contribution is a novel alignment-insensitive shape similarity (AISS) metric that tolerates misalignment of shapes while accounting for the differences in position, orientation and scaling. Together with the constrained deformation approach, we formulate the ASCII art generation as an optimization that minimizes shape dissimilarity and deformation. Convincing results and user study are shown to demonstrate its effectiveness.
Book
".the perfect companion to Programming Massively Parallel Processors by Hwu & Kirk."-Nicolas Pinto, Research Scientist at Harvard & MIT, NVIDIA Fellow 2009-2010 Graphics processing units (GPUs) can do much more than render graphics. Scientists and researchers increasingly look to GPUs to improve the efficiency and performance of computationally-intensive experiments across a range of disciplines. GPU Computing Gems: Emerald Edition brings their techniques to you, showcasing GPU-based solutions including: Black hole simulations with CUDA GPU-accelerated computation and interactive display of molecular orbitals Temporal data mining for neuroscience GPU -based parallelization for fast circuit optimization Fast graph cuts for computer vision Real-time stereo on GPGPU using progressive multi-resolution adaptive windows GPU image demosaicing Tomographic image reconstruction from unordered lines with CUDA Medical image processing using GPU -accelerated ITK image filters 41 more chapters of innovative GPU computing ideas, written to be accessible to researchers from any domain GPU Computing Gems: Emerald Edition is the first volume in Morgan Kaufmann's Applications of GPU Computing Series, offering the latest insights and research in computer vision, electronic design automation, emerging data-intensive applications, life sciences, medical imaging, ray tracing and rendering, scientific simulation, signal and audio processing, statistical modeling, and video/image processing. Covers the breadth of industry from scientific simulation and electronic design automation to audio/video processing, medical imaging, computer vision, and more Many examples leverage NVIDIA's CUDA parallel computing architecture, the most widely-adopted massively parallel programming solution Offers insights and ideas as well as practical "hands-on" skills you can immediately put to use. © 2011 NVIDIA Corporation and Wen-mei W. Hwu Published by Elsevier Inc. All rights reserved.
Article
The main contribution of this paper is to show a new approach for FM screening which we call Local Exhaustive Search (LES) method, and to present ways to accelerate the computation using an FPGA. FM screening, as opposed to conventional AM screening, keeps unit dot size when converting an original gray-scale image into the binary image for printing. FM screening pays great attention to generate moiré-free binary images reproducing continuous-tone and fine details of original photographic images. Our basic approach for FM screening is to generate a binary image whose projected image onto human eyes is very close to the original image. The projected image is computed by applying a Gaussian filter to the binary image. LES performs an exhaustive search for each of the small square subimages in the binary image and replaces the subimage by the best binary pattern. The exhaustive search is repeated until no more improvement is possible. The experimental results show that LES produces a high quality and sharp binary image. We also implemented LES on an FPGA to accelerate the computation and achieved a speedup factor of up to 51 over the software implementations.
Conference Paper
This chapter deals with digital halftoning. In terms of photolithography, halftoning involved projecting light from the negative of a continuous-tone photograph, through a mesh screen, such as finely-woven silk, onto a photosensitive plate. Later versions of the halftoning process employed screens made of glass that were coated on one side by an opaque substance. Later still, the glass plate mesh was replaced altogether with a flexible piece of processed film, placed directly in contact with the unexposed lithographic film. This contact screen had direct control of the dot structure. The screen controlled the screen frequency, the dot shape, and the screen angle. This chapter discusses three types of halftonig—amplitude-modulated (AM) halftoning, frequency-modulated (FM) halftoning, and AM-FM hybrids. In AM halftoning, halftone screen has been replaced in digital printers with a raster image processor (RIP) that converts each pixel of the original image from an intermediate tone directly into a binary dot. In FM or stochastic screening techniques, dots of constant size but variably spaced according to tone, are available to digital printers. AM–FM hybrids produce dot clusters that vary according to tone in both their size and spacing.
Article
In this work, we propose a new method to generate halftone images which are visually optimized for the display device. The algorithm searches for a binary array of pixel values that minimizes the difference between the perceived displayed continuous-tone image and the perceived displayed halftone image. The algorithm is based on the direct binary search (DBS) heuristic. Since the algorithm is iterative, it is computationally intensive. This limits the complexity of the visual model that can be used. It also impacts the choice of the metric used to measure distortion between two perceived images. In particular, we use a linear, shift- invariant model with a point spread function based on measurement of contrast sensitivity as a function of spatial frequency. The non-ideal spot shape rendered by the output devices can also have a major effect on the displayed halftone image. This source of non-ideality is explicitly accounted for in our model for the display device. By recursively computing the change in perceived mean-squared error due to a change in the value of a binary pixel, we achieve a substantial reduction in computational complexity. The effect of a trial change may be evaluated with only table lookups and a few additions.