ArticlePDF Available

General-Purpose Computation on GPUs in the Browser Using gpu.js

Authors:

Abstract

gpu.js is a client-side GPGPU library for the browser written entirely in JavaScript. Unlike some of the earlier implementations of client-side GPGPU, gpu.js does not require browser support through an explicit GPGPU API or an installation of a custom native runtime browser extension to enable such support. This allows the library to run on all modern platforms, including on mobile devices such as smartphones. It achieves this by using the already widely adopted WebGL graphics API in a manner that it is not designed for by making use of JavaScript to GLSL transpilation. The library abstracts away all the unnecessary implementation details of awkwardly performing GPGPU on a graphics API and, at the same time, provides an API that was designed specifically for GPGPU.
General-Purpose Computation on
GPUs in the Browser using gpu.js
Fazli Sapuan and Matthew Saw
School of Computing
National University of Singapore
{a0096836,a0097556}@u.nus.edu
Eugene Cheah
Independent Researcher
eugene@picoded.com
gpu.js is a client-side GPGPU library for the browser
written entirely in JavaScript. Unlike some of the
earlier implementations of client-side GPGPU,
gpu.js does not require browser support through an
explicit GPGPU API or an installation of a custom
native runtime browser extension to enable such
support. This allows the library to run on all modern
platforms, including on mobile devices such as
smartphones. It achieves this by using the already
widely adopted WebGL graphics API in a manner
that it is not designed for by making use of
JavaScript to GLSL transpilation. The library
abstracts away all the unnecessary implementation
details of awkwardly performing GPGPU on a
graphics API and, at the same time, provides an API
that was designed specifically for GPGPU.
Keywords: Web, JavaScript, WebGL, GPGPU, High
Performance Computing
Despite extensive experimental work on client-side
GPGPU[1][2], there is currently no general support
for an explicit API for GPGPU in the browser. Web
standards for parallel computing such as WebCL
have been completed and ratified since 2014[3].
Even with lukewarm response from industry players,
interest for the WebCL standards have appeared to
have died out. Samsung provided patches for the
implementation of WebCL in the popular Webkit
rendering engine (used then by Safari, Chrome and
Opera), but eventually withdrew the feature request.
Nokia provided a Firefox browser extension for a
WebCL runtime written in C, but eventually also
pulled support for it.
There were several reasons cited for the lack of
support. Firstly, there are disagreements in whether
or not the WebCL specification is actually ready for
production usage. Secondly, both browser
developers and industry players do not wish to
undertake the burden of maintaining support for
WebCL in the browsers. Thirdly, there was
incomplete support for all platforms in the provided
implementations by Samsung and Nokia. Fourthly,
the demand for client-side GPGPU is practically
insignificant compared to other, more important,
features in development. And lastly, support for GPU
computing is already scheduled in the
ARB_compute_shader support to be added in an
unspecified future WebGL version that uses the
OpenGL ES 3.1 standard. Presently, WebGL 1.0[4]
(released 2011) uses OpenGL ES 2.0 and the next
iteration, WebGL 2.0[5] (released 2017), uses
OpenGL ES 3.0.
In our opinion, this situation has created a sort of
chicken and egg problem; browser developers do not
want to support GPGPU because of the lack of
interest and, web developers are not interested in
GPGPU because of the lack of browser support. As a
result of the present deadlock in the client-side
GPGPU space, some library writers have undertaken
the initiative to write alternative client-side GPGPU
implementations using the popularly available
WebGL graphics API.
We created gpu.js to provide a new way for
programmers to write client-side GPGPU entirely in
JavaScript without the use of a special general-
purpose GPU language like those used in other
platforms like CUDA or OpenCL. The arcane quirks
unrelated to computation are abstracted and
completely opaque to the user.
The user does not need to install any additional
software or worry about portability because gpu.js
provides a stable API that can work on any modern
computing platform with a browser. In particular,
gpu.js has specifically been tested to work on a wide
variety of graphics hardware, spanning across
different manufacturers and micro-architectures,
namely:
Nvidia GTX 1080
AMD HD 7950
Nvidia GT 650M (mobile)
Intel HD 2000 (mobile)
Intel HD 4000 (mobile)
Apple iPhone 7
Raspberry Pi 3 Model B
All in all, these reasons makes gpu.js very easy to
pick up, making it suitable as an introductory
experience to GPGPU for students, developers and
hobbyists.
Related Work
The approach of using a graphics API to perform
high-performance general-purpose computation is
not a novel one. It is the same strategy that was
historically employed[6] before the advent of GPU
computing platforms such as CUDA or OpenCL.
Due to the similarity between desktop graphics APIs
such as OpenGL and the WebGL API, there have
been several early proof of concepts that have
appeared shortly after the availability of WebGL[7].
However, there was still a clear need for a GPU
abstraction layer because using the WebGL API
directly by itself is too cumbersome and platform
dependent. Thus several libraries have emerged for
this specific purpose.
Date Released Kernel
Language
WebCLGL[8] May 2013 WebCL subset
gpu.js January 2016 JavaScript
subset
WebMonkeys[9] August 2016 GLSL 1.0 with
extensions
turbo.js[10] October 2016 GLSL 1.0
Table 1: A brief comparison between some
client-side GPU abstraction libraries
These libraries share many similarities. The
underlying mechanism are all the same; they all
generate GLSL code, encode computational inputs
and outputs into textures and are generally easy to
use due to the abstraction of the graphics API.
WebCLGL was one of the earliest GPU abstraction
library to appear. As its name suggest, it was built on
the assumption that WebCL would eventually make
its way to general adoption. Thus, the library allows
the user to write the compute kernel in language that
is a subset of the WebCL language. That way, the
library could potentially transition from the WebGL
backend to the native WebCL API without breaking
compatibility for programs already written for the
library. Unfortunately, the WebCL adoption did not
materialize but the very idea of a stable stopgap API
became one of the key influences for gpu.js.
Learning from the experience of the WebCLGL
library, gpu.js was created from a slightly different
perspective. Instead of preempting any future API
that does not exist yet, gpu.js would instead start off
from a more restrictive programming model that (1)
is truly platform agnostic and (2) mirrors the WebGL
API more closely. This way, we are confident that it
will be more likely that the library will be able to
transition to any future API that may appear in the
future.
More recently, libraries such as WebMonkeys and
turbo.js have been developed independently. The
most notable difference between these libraries and
gpu.js is that these libraries have a more simplified
approach towards GLSL code generation. In some
ways, this is an advantage over the gpu.js approach.
As the source language of the compute kernels
provided by the user is already in GLSL, there is
very little preparation required before using the
WebGL API. At the same time, it also gives the
programmer powerful low-level access to the
capabilities of the GPU, allowing them to fine-tune
performance to an even greater degree. However,
there are some trade-offs to consider when using
GLSL as the source language. GLSL has a different
syntax from JavaScript and has features specifically
meant for graphics computation. GLSL is not a
native language with first class support in
JavaScript; any GLSL code must be written either as
a string literal or, stored in external places such as
the DOM or separate files. More importantly, the
target language of the libraries are practically locked
to GLSL, which would mean that they cannot run the
compute kernels on the CPU as plain JavaScript, or
migrate the backend to a different API as easily as
gpu.js.
Also of note is weblas[11] (released Nov 2015), a
specialized numerical computing library for the
browser that also abstracts away the necessary
WebGL API calls. Unlike the other GPU abstraction
libraries, weblas does not allow the user to create
their own compute kernels. Instead, it provides a
fixed set of mathematical functions using already
fine-tuned GLSL shader programs. In exchange for
the lack of flexibility, weblas could achieve a higher
degree of performance.
Matrix Multiplication
Example
gpu.js uses a restricted SPMD (single program,
multiple data) computation model to achieve
parallelism. It is designed in this manner because of
the technical limitations of working with the WebGL
graphics API.
Every thread must run the same code and produce
the results for different parts of a single array to be
computed. Additionally, while the program is able to
read from multiple input arrays, the threads in the
program are limited to writes to ephemeral thread-
local variables and, ultimately, only a single entry in
the output array. The position of this entry is
provided by a coordinate stored in a special variable
unique to the thread. As there is no ability to write to
variables that are shared between threads, no
synchronization is required while inside the thread
context.
This computation model exactly mirrors the one
used to render an image; in computer graphics
rendering, individual shader threads on the GPU
only produce a single pixel in a large image.
Matrix multiplication is an example of a problem
that is especially well suited for this particular
computation model. From the view of the entries of
an
n×p
output matrix resulting from a matrix
multiplication operation, it is easy to see that the
problem can be decomposed to
np
independent
sub-problems that can be computed using the same
parameterized function.
F(x , y )=F(AB )xy=
k=1
m
Axk Bky ,
where,
Ais an n×mmatrix,
Bis a m×pmatrix
AB is an n×pmatrix
We can use this problem as an introduction on the
usage of gpu.js. The implementation is only a few
lines.
The gpu.js runtime, GPU, manages the state of the
canvas, the WebGL context used to execute
vectorized functions and the necessary helper
functions to facilitate CPU-GPU communication.
The input to the program is defined as simple two-
dimensional JavaScript arrays in row-major layout.
There are no special classes or annotation required to
prepare the matrices for computation on gpu.js. The
only requirement for the arrays are that they must be
properly defined in a linear, rectangular or cuboid
shape for 1-dimensional, 2-dimensional and 3-
dimensional inputs respectively. That is to say that in
this particular case, every row specified must have
the same number of entries.
The JavaScript function provided to the higher-order
createKernel function is the compute kernel for
matrix multiplication. This function evaluates the
// Initialize the gpu.js runtime
const gpu = new GPU();
// A and B are matrices defined as 2D JavaScript arrays
const A = [[1,2,3,…],[4,5,6,…],… ];
const B = [[1,2,3,…],[4,5,6,…],… ];
// Create a new JavaScript function using dimension
// information and a compute kernel as input
let mat_mult = gpu.createKernel(function(A, B) {
let sum = 0;
for (let i=0; i<512; i++) {
sum += A[this.thread.y][i]*B[i][this.thread.x];
}
return sum;
}).setOutput([512, 512]);
// Perform matrix multiplication
let C = mat_mult(A, B);
individual entries of the resultant matrix. It is
important to note that this function must be stateless
and must be able to run independently. This essential
property allows the vectorization of the compute
kernel which is compiled into GLSL shader language
compatible with the WebGL API which is then
further compiled into a GLSL shader program.
The createKernel method returns a new GPU
accelerated function that has the same inputs as
specified in the compute kernel and returns an output
in the shape of the dimensions specified to the
createKernel method. The output of this
particular function is the resultant matrix of the
matrix multiplication.
In order to actually perform the matrix
multiplication, the accelerated function is called as
per normal. At this point, the inputs to the function
are marshaled into the GPU device. The GLSL
shader program is executed and the output is
marshaled back into the JavaScript context. All these
complex operations happen without the user needing
to understand what is gong on behind the scenes.
If the result of the computation is intended to be used
in another accelerated function, you can use the
outputToTexture option like so:
By doing so, data that does not have to be accessible
in the CPU context can stay inside the GPU device
and gpu.js does not need to waste time by having to
transfer the data from the GPU to the CPU and back
into the GPU again. This is especially useful to
compensate for the lack of the ability to immediately
write to shared variables inside the compute kernel
—a common feature provided by other GPU
frameworks to speed up computation. Shared
variables simply have to be written as the main
output of the compute kernel and passed to another
compute kernel to complete the computation.
The disadvantage of using this implementation is
that the act of synchronization (which might slow
down the program) is forced on gpu.js, whereas on
other GPU frameworks, it is up to the user.
Performance Analysis
In order to evaluate the performance of our newly
created accelerated matrix multiplication function,
we use the Benchmark.js framework to measure the
execution timings of the program over various input
sizes. CPU (Native) refers to the execution timings
when the matrix multiplication is performed directly
without the gpu.js runtime, whereas CPU (Runtime)
refers to the execution timings with the gpu.js
runtime.
CPU
(Native)
CPU
(Runtime)
GPU Speedup
CPU
(Native) /
GPU
128 x
128
0.002s
±0.6%
0.003s
±1.5%
0.009s
±5.3%
0.22
256 x
256
0.024s
±0.7%
0.027s
±1.2%
0.009s
±8.1%
2.67
512 x
512
0.200s
±0.7%
0.258s
±2.0%
0.015s
±6.8%
13.33
1024 x
1024
2.417s
±0.9%
3.935s
±2.5%
0.045s
±5.3%
53.71
2048 x
2048
36.782s
±3.6%
44.371s
±1.1%
0.239s
±6.2%
153.90
Table 2: Execution times of matrix
multiplication with various input sizes on i7-
7700K, GTX1080 (2017)
Figure 1: Compute kernel compilation process
// Set the outputToTexture option flag to true
mat_mult.setOutputToTexture(true);
// Perform matrix multiplication as per normal
let C_texture = mat_mult(A, B);
// We can immediately re-use the output without the
// round trip penalty
let D_texture = mat_mult(C_texture, B);
// Retrieve the contents from the texture to use
// in the JavaScript context
let D = D_texture.toArray();
128 x 128
256 x 256
512 x 512
1024 x 1024
2048 x 2048
0
0.01
0.1
1
10
100
CPU (Native)
CPU (Runtime)
GPU
Size of input
Execution time
Figure 2: Plot of execution time of matrix
multiplication against size of input.
For small inputs, GPU performance on gpu.js is
actually slower than running on CPU. This can be
explained by the significant overhead incurred while
maintaining the WebGL context and the round trip
time of the data between the CPU and GPU.
However, in the larger size inputs that gpu.js are
designed to handle, gpu.js is multiple orders of
magnitude faster.
It is also important to note that there appears to be
about 10–35% of significant overhead when running
on the CPU with the gpu.js runtime. This penalty is
the trade-off for the convenience of not having to
rewrite the program in order to not use the runtime.
Reading from the plot, the time complexity appears
to be similar enough that comparisons between the
order of growth of the execution times of the CPU
(Runtime) and GPU modes can still be valid when
using the upper bounds.
In theory, the naive matrix multiplication
implementation should have a time complexity of
O(n3)
, where n is the length of the longer side of
one of the matrices. In comparison, the matrix
multiplication on the GPU as implemented in the
example should have a time complexity of
in the ideal case. Which is to say that there was
enough GPU compute units to execute all threads of
execution in the kernel simultaneously (the
GTX1080 has 2,560 “CUDA cores”). For a
sufficiently large n, matrix multiplication should be
faster on the GPU. This analysis appears to be
consistent with the measured results.
Even then, matrix multiplication with
n>256
is
not very common outside the field of scientific
computing. In fact, only a limited number of
problems will have this unusually high level of
computational intensity. This, coupled with the
technical limitations imposed by the WebGL API,
severely limits the applicability of gpu.js. For
example, an array map operation on an array of
length n would have a theoretical time complexity of
O(1)
on the GPU and
O(n)
on the CPU.
Nonetheless, due to the relatively smaller time
complexity advantage, it is more likely that the
magnitude of n required would actually be too large
to be practical even for this rather common GPGPU
strategy.
In spite of this, we believe that gpu.js could still be a
useful tool in some niche applications. In particular,
the following types of GPGPU problems have been
identified to be suitable for gpu.js:
CPU Time
Complexity
(Naive)
GPU Time
Complexity
(Ideal)
Ray Tracing
O(kn2)
O(k)
Gaussian Blur
O(n2k2)
O(k2)
Game of Life
Iteration
O(n2)
O(1)
Voronoi
Diagram Plot
O(kn2)
O(k)
N-bodies
O(n2)
O(n)
Convolutional
Operations
O(n2k2)
O(k2)
Table 3: Well-known GPGPU problems with
sufficient time complexity advantage for GPUs.
Some gpu.js users have already published
working demos for these problems online.
Needless to say, we are ever optimistic that with new
advances in graphics and GPU computation
capabilities on the web, this situation can only
improve in the future.
WebGL CPU-GPU Data
Transport
As we are using the WebGL API in a way that it is
not designed for, there are some major issues with
transporting data between the CPU and the GPU.
Firstly, as graphics computation do not require very
high precision unlike general computations, the
maximum precision achievable in WebGL 1.0 is 32-
bit floating point which can accurately represent less
numbers than the usual 64-bit floating point used in
modern processors.
Secondly, while you can use the uniform buffer to
transport numerical data on WebGL, unfortunately
the uniform buffer has very tight space constraints—
not enough for gpu.js to fully realize the
performance benefit of executing on the GPU.
The workaround approach used by gpu.js is to use
graphical textures as the container for the data
transport. This textures easily have a dimension limit
upwards of
2048×2048
in modern operating
systems. In order to maximize the amount of data we
can transport using textures, gpu.js will transparently
reshape any odd-shaped input and output arrays into
squares while in transport.
These textures are designed to store color
information of up to 4 channels, with each channel
having a dynamic range of
[0.0,1.0]
. As such,
gpu.js provides two different strategies to safely
encode floating point numbers that have a dynamic
range of
(−∞,∞)
in an accurate and stable
manner.
Fixed Point Textures
As shaders in WebGL 1.0 are based on a very old
1992 GLSL 1.0 standard, older implementations of
WebGL can only support 32-bit colors encoded in a
fixed point format. Each color channel is a byte (0-
255) which represents the 256 possible values
linearly interpolated between
[0.0,1.0]
. The
major advantage of using such an old format is that it
has ubiquitous support on all platforms.
Conveniently for us, the floating point to be used in
the shader program will also have a bitwidth of 32
bits. Thus, we can encode our 32-bit floating point
numbers by reinterpreting the bits as 4 separate bytes
in the 4 channels that make up a full color. This is
can be done efficiently in JavaScript by manipulating
JavaScript typed arrays buffers.
Inside the shader, the color is decoded by
reassembling the bytes into a single 32-bit floating
point number. GLSL 1.0 supports neither the
uintBitsToFloat API nor any of the bitwise operators
required to do this. Instead, gpu.js supplies its own
implementation of uintBitsToFloat written using
only arithmetic operations supported in GLSL 1.0.
To get the data out of the shader, the entire process is
simply performed in reverse order.
While this strategy works fine in most platforms, the
special implementation of uintBitsToFloat is very
susceptible to flaws in the graphics platform. The
WebGL specification allows implementations to
perform arithmetic rounding that do not conform to
the IEEE 754 standard[12] commonly used in
general computing. Our custom implementation is
aware of this limitation and tries to work with such
loose precision guarantees by the rearrangement of
operations to minimize the possible damage caused
and by using integer division wherever possible.
Although rare, the highest magnitude of error
possible comes from decoding the most significant
byte of the 32-bit floating point number which
contains the sign and the upper 7 bits of the
exponent. If the byte is off just by 1, the number
decoded could potentially be four times or a quarter
as big as the original number.
Floating Point Textures
Newer implementations of WebGL implements an
optional OES_texture_float extension which
allows 32-bit floating point numbers to be used in
individual channels of a texture. In this mode, four
32-bit floating point numbers are packed into a
single color. This means that, unlike fixed point
mode, it is possible for the shader to output four
outputs per execution. As a result, less shader threads
are required to be created for the program, lowering
the overhead of extra threads and potentially
increasing the performance of the program.
Rather annoyingly however, OES_texture_float
does not specify whether the readPixels API is
allowed to interpret in floating point format. Some
browser do not allow such an operation; other
browsers work as expected in our favor.
Unlike the testing for available WebGL extensions,
there is no official way to currently test the
availability of this functionality. Instead, the gpu.js
runtime runs its own test by performing
readPixels on a dummy framebuffer as part of the
initialization process. If the operation fails with a
runtime exception or produces an unexpected result,
the fixed point strategy is used instead to transport
data out of the shader program. The program will not
be able to enjoy the same performance benefit of
running entirely in floating point texture mode.
The table below shows the performance differences
between fixed point and floating point modes.
Execution Time Speedup (Best vs. this)
Full fixed
point mode
0.048s ±6.3% 1.07
Floating
point input,
fixed point
output
0.046s ±5.3% 1.02
Full floating
point mode
0.045s ±5.3% 1.00
Table 4: Execution times for matrix
multiplication of 1024 x 1024 matrices with
various transport modes on i7-7700K, GTX1080
(2017)
It appears that in practice, the difference in
performance is not very significant and are within
the margins of error of each other. One theory that
would explain the discrepancy could be because the
number of processing elements inside the GPU of
this particular platform is very high and can tolerate
the higher thread count overhead. As such, the
difference might be more observable on a platform
with fewer processing elements available.
Further Work
More features and improvements are currently being
developed for the library. Of particular interest is the
JavaScript to GLSL transpiler which currently only
supports a very restricted subset of JavaScript. More
types can be supported in the transpiler by
implementing type inference (JavaScript does not
support explicit typing). In addition, the compilation
errors could be made more user friendly to be
generally less frustrating to use when a compilation
failure occurs.
In the way of performance, the runtime could be
optimized to minimize the high overhead
experienced in CPU and GPU mode. And in the
produced shader programs, SIMD operations could
automatically be employed by the compiler in order
to speed up computations in the kernel. To work
around the limits of a texture data transport, the
runtime could automatically and transparently split
up the input and the computations into smaller
chunks in separate textures.
Furthermore, gpu.js could also support future APIs
that are better suited for GPGPU operations. But in
the mean time, we hope that gpu.js is useful as a
stable stopgap API until such time as the availability
of the future APIs.
Acknowledgment
We would like to thank the efforts of Dr. Low Kok-
Lim, Assoc. Prof. Hugh Anderson as well as all
contributors on GitHub for their significant
contributions to the project.
References
[1] J. Wang, N. Rubin, and S. Yalamanchili,
“ParallelJS: An Execution Framework for
JavaScript on Heterogeneous Systems,” in
Proceedings of Workshop on General
Purpose Processing Using GPUs, New
York, NY, USA, 2014, p. 72:72–72:80.
[2] M. Bourgoin and E. Chailloux, “High
Performance Client-Side Web Programming
with SPOC and Js_of_ocaml,” hgpu.org,
Sep. 2014.
[3] Khronos WebCL Working Group, “WebCL
Specification,” 2014. [Online]. Available:
https://www.khronos.org/registry/webcl/spec
s/latest/1.0/
[4] Khronos WebGL Working Group, “WebGL
Specification,” 2011. [Online]. Available:
https://www.khronos.org/registry/webgl/spe
cs/latest/1.0/
[5] Khronos WebGL 2 Working Group,
“WebGL 2 Specification,” 2017. [Online].
Available:
https://www.khronos.org/registry/webgl/spe
cs/latest/2.0/
[6] H. Mark, “Chaper 31: Mapping
Computational Concepts to GPUs,” in GPU
Gems 2: Programming Techniques for
High-Performance Graphics and General-
Purpose Computation, First Edition edition.,
M. Pharr and R. Fernando, Eds. Upper
Saddle River, NJ: Addison-Wesley
Professional, 2005.
[7] M. Chouza, “GPGPU with WebGL: solving
Laplace’s equation,” Spin Foam, 21-Feb-
2011. [Online].
Available:
https://mchouza.wordpress.com/2011/02/21/
gpgpu-with-webgl-solving-laplaces-
equation/
[8] R. Gonzalez, webclgl: Javascript Library
for general purpose computing on GPU.
2013. [Online].
Available:
https://github.com/stormcolor/webclgl
[9] M. Victor, WebMonkeys: Massively parallel
GPU programming on JavaScript, simple
and clean. 2016. [Online].
Available:
https://github.com/MaiaVictor/WebMonkeys
[10] minxomat, turbo.js: perform massive
parallel computations in your browser with
GPGPU. 2016. [Online].
Available:
https://github.com/turbo/js
[11] W. Flinn, weblas: GPU Powered BLAS
for Browsers. 2016. [Online].
Available:
https://github.com/waylonflinn/weblas
[12] IEEE Computer Society, “IEEE Standard
for Floating-Point Arithmetic.” 2008.
... GPU.js (Sapuan et al., 2018) is a library that parses array processing functions written in JavaScript, converts them into GLSL code for WebGL, and makes them executable on the GPU. Since all array processing needs to be written in JavaScript, it is not possible to utilize code assets implemented using NumPy. ...
Preprint
Full-text available
To execute scientific computing programs such as deep learning at high speed, GPU acceleration is a powerful option. With the recent advancements in web technologies, interfaces like WebGL and WebGPU, which utilize GPUs on the client side of web applications, have become available. On the other hand, Pyodide, a Python runtime that operates on web browsers, allows web applications to be written in Python, but it can only utilize the CPU, leaving room for acceleration. Our proposed new library, WgPy, provides array computation capabilities on the GPU with a NumPy-compatible interface in the web browser. This library not only implements array operations such as matrix multiplication on WebGL and WebGPU, but also allows the users to write custom kernels that can run on GPUs with minimal syntax knowledge, allowing you to run a variety of algorithms with minimal overhead. WgPy also implements a special thread synchronization mechanism, which bridges asynchronous semantics of JavaScript with Python's synchronous semantics, allows code written for CuPy, the NumPy-compatible array library for CUDA, to run directly in a web browser. In experiments involving training a CNN model, it achieved processing at 95 times the speed compared to CPU execution.
... The last analyzed method is the usage of GPGPU (General-purpose computing on graphics processing units). It involves tricky usage of the WebGL's graphics pipeline not to generate graphics, but using its advantages, executing algorithms that can be massively parallelized [17]. The caveats of this method are slow data transfer between CPU and GPU, and lack of shared memory which is the characteristic of the pipeline. ...
Chapter
Full-text available
In this paper, we present an analysis of popular acceleration methods in JavaScript execution environments including Chrome, Firefox, Node, and Deno. We focus evenly on adopting the same codebase to take advantage of every method, benchmarking our solutions and caveats of building libraries compatible with multiple environments. To compare performance, we use a simplified standard Hough transform algorithm. As reference points of our benchmarks, we use a sequential version of the algorithm written in both JavaScript and C++. Our study shows that Chrome is the fastest JS environment in every benchmark and Firefox is the slowest in which we identified optimization problems. WebGL appears as the fastest acceleration method. Without parallel execution native C++ addon in Node is the most performant. This analysis will help to find the most efficient way to speed up execution making JavaScript a more robust environment for CPU-intensive computations.KeywordsJavaScriptAccelerationSHTStandard Hough transformNodeBrowserDenoWebGLWebpackWASMSIMDWorkers
... Thus, any decision-maker needs to consider various quality attributes and explore possible quality trade-offs between them (i.e., one QA is improved, whereas another deteriorates) (Bass et al., 2003). By seeking for explicit trade-off analysis studies in our dataset, we have identified only one (Naughton et al., 2018) study that identifies trade-offs and only two ( (Abdullin et al., 2017) and (Sapuan et al., 2018)) that identify cut-off points (i.e., the same practice can have both positive and negative impact, based on some parameters). In response to RQ2.3, we present two views of empirical validation methods. ...
Preprint
Full-text available
Background: The development of scientific software applications is far from trivial, due to the constant increase in the necessary complexity of these applications, their increasing size, and their need for intensive maintenance and reuse. Aim: To this end, developers of scientific software (who usually lack a formal computer science background) need to use appropriate software engineering (SE) practices. This paper describes the results of a systematic mapping study on the use of SE for scientific application development and their impact on software quality. Method: To achieve this goal we have performed a systematic mapping study on 359 papers. We first describe a catalogue of SE practices used in scientific software development. Then, we discuss the quality attributes of interest that drive the application of these practices, as well as tentative side-effects of applying the practices on qualities. Results: The main findings indicate that scientific software developers are focusing on practices that improve implementation productivity, such as code reuse, use of third-party libraries, and the application of "good" programming techniques. In addition, apart from the finding that performance is a key-driver for many of these applications, scientific software developers also find maintainability and productivity to be important. Conclusions: The results of the study are compared to existing literature, are interpreted under a software engineering prism, and various implications for researchers and practitioners are provided. One of the key findings of the study, which is considered as important for driving future research endeavors is the lack of evidence on the trade-offs that need to be made when applying a software practice, i.e., negative (indirect) effects on other quality attributes.
Article
Full-text available
Transpilers refer to a special type of compilation that takes source code and translates it into target source code. This type of technique has been used for different types of implementations in scientific studies. A review of the research areas related to the use of transpilers allows the understanding of the direction in this branch of knowledge. The objective was to carry out an exhaustive and extended mapping of the usage and implementation of transpilers in research studies in the last 10 years. A systematic mapping review was carried out for answering the 5 research questions proposed. The PSALSAR method is used as a guide to the steps needed for the review. In total, from 1181 articles collected, 683 primary studies were selected, reviewed, and analyzed. Proposals from the industry were also analyzed. A new method for automatic data tabulation has been proposed for the mapping objective, using a relational database and SQL language. It was identified that the most common uses of transpilers are related to performance optimizations, parallel programming, embedded systems, compilers, testing, AI, graphics, and software development. In conclusion, it was possible to determine the extent and identification of research sub-areas and their impact on the usage of the transpilers. Future research could be considered about the usage of transpilers in transactional software, migration strategies for legacy systems, AI, math, multiplatform games and apps, automatic source code generation, and networking.
Article
Background The development of scientific software applications is far from trivial, due to the constant increase in the necessary complexity of these applications, their increasing size, and their need for intensive maintenance and reuse. Aim To this end, developers of scientific software (who usually lack a formal computer science background) need to use appropriate software engineering (SE) practices. This paper describes the results of a systematic mapping study on the use of SE for scientific application development and their impact on software quality. Method To achieve this goal we have performed a systematic mapping study on 359 papers. We first describe a catalog of SE practices used in scientific software development. Then, we discuss the quality attributes of interest that drive the application of these practices, as well as tentative side-effects of applying the practices on qualities. Results The main findings indicate that scientific software developers are focusing on practices that improve implementation productivity, such as code reuse, use of third-party libraries, and the application of “good” programming techniques. In addition, apart from the finding that performance is a key-driver for many of these applications, scientific software developers also find maintainability and productivity to be important. Conclusions The results of the study are compared to existing literature, are interpreted under a software engineering prism, and various implications for researchers and practitioners are provided. One of the key findings of the study, which is considered as important for driving future research endeavors is the lack of evidence on the trade-offs that need to be made when applying a software practice, i.e., negative (indirect) effects on other quality attributes.
Conference Paper
JavaScript has been recognized as one of the most widely used script languages. Optimizations of JavaScript engines on mainstream web browsers enable efficient execution of JavaScript programs on CPUs. However, running JavaScript applications on emerging heterogeneous architectures that feature massively parallel hardware such as GPUs has not been well studied. This paper proposes a framework for flexible mapping of JavaScript onto heterogeneous systems that have both CPUs and GPUs. The framework includes a frontend compiler, a construct library and a runtime system. JavaScript programs written with high-level constructs are compiled to GPU binary code and scheduled to GPUs by the runtime. Experiments show that the proposed framework achieves up to 26.8x speedup executing JavaScript applications on parallel GPUs over a mainstream web browser that runs on CPUs.
Conference Paper
JavaScript has been recognized as one of the most widely used script languages. Optimizations of JavaScript engines on mainstream web browsers enable efficient execution of JavaScript programs on CPUs. However, running JavaScript applications on emerging heterogeneous architectures that feature massively parallel hardware such as GPUs has not been well studied. This paper proposes a framework for flexible mapping of JavaScript onto heterogeneous systems that have both CPUs and GPUs. The framework includes a frontend compiler, a construct library and a runtime system. JavaScript programs written with high-level constructs are compiled to GPU binary code and scheduled to GPUs by the runtime. Experiments show that the proposed framework achieves up to 26.8x speedup executing JavaScript applications on parallel GPUs over a mainstream web browser that runs on CPUs.
Article
Recently, graphics processors have emerged as a powerful computational platform. A variety of encouraging results, mostly from researchers using GPUs to accelerate scientific computing and visualization applications, have shown that significant speedups can be achieved by applying GPUs to data-parallel computational problems. However, attaining these speedups requires knowledge of GPU programming and architecture.The preceding chapters have described the architecture of modern GPUs and the trends that govern their performance and design. Continuing from the concepts introduced in those chapters, in this chapter we present intuitive mappings of standard computational concepts onto the special-purpose features of GPUs. After presenting the basics, we introduce a simple GPU programming framework and demonstrate the use of the framework in a short sample program.
WebMonkeys: Massively Parallel GPU Programming on JavaScript, Simple and Clean
  • M Victor
GPGPU with WebGL: solving Laplace's equation
  • M Chouza
M. Chouza, "GPGPU with WebGL: solving Laplace's equation," Spin Foam, 21-Feb-2011. [Online].
webclgl: Javascript Library for general purpose computing on GPU
  • R Gonzalez
R. Gonzalez, webclgl: Javascript Library for general purpose computing on GPU. 2013. [Online].
IEEE Standard for Floating-Point Arithmetic
] minxomat, turbo.js: perform massive parallel computations in your browser with GPGPU
  • M Victor
M. Victor, WebMonkeys: Massively parallel GPU programming on JavaScript, simple and clean. 2016. [Online]. Available: https://github.com/MaiaVictor/WebMonkeys [10] minxomat, turbo.js: perform massive parallel computations in your browser with GPGPU. 2016. [Online]. Available: https://github.com/turbo/js