Content uploaded by Christos Kotselidis
Author content
All content in this area was uploaded by Christos Kotselidis on Dec 20, 2018
Content may be subject to copyright.
VEE’17, Xi’an, China C. Kotselidis et al.
ARMv7
x86-64 GPUs FPGAs
Hardware
Maxine VM
T1X
OpenCL
Heterogeneous
Accelerator
Java7, Java8, C++, OpenMP KFusion Implementations
(derived fromSLAMBench)
Native
(C++/OpenMP)
ApplicationsRuntime Layer
OpenJDK
C1X/Graal
Client
Memory
Manager (GC)
Memory
Manager (GC)
MAST
FPGA Accelerator
Framework
thin thick
Heterogeneous Managed Runtime Systems: A Computer Vision Case Study VEE’17, Xi’an, China
57.08
75.98
81.63
36.36
86.30
68.96
90.49
35.13
26.13
99.7
44.8
72.64
37.53
50.34
0
10
20
30
40
50
60
70
80
90
100
Hotspot-C2-1.8.0.25 Hotspo t-Graal-21075 (Original) Maxine-Graal-20290 (Original) Maxine-Graal-20381 (Current)
12
20
8
29
17
5
14 8
25
6
27
20 18
57
44
13
31
22
38
28
40
20
34
65
76
24
50
31
49 47
0
20
40
60
80
geome an startup compiler compress crypto derby mpegaudio scimark sunflow xml
MaxineVM-ARMv7 OpenJDK_1.7.0_40-Client OpenJDK_1.7.0_40-Server
Serial
Task Graph
Methods
OpenCL/Java API
preprocessingGraph = new TaskGraph()
.streamIn(depthImageInput)
.add(ImagingOps::mm2metersKernel,
scaledDepthImage,
depthImageInput, scalingFactor)
.add(ImagingOps::bilateralFilter,
pyramidDepths[0],
scaledDepthImage,
gaussian, eDelta, radius)
.mapAllTo(deviceMapping);
Optimized
Graph
- Users create Task Graphs
with our OpenCL API.
Graph Optimizer
- The compiler expands
graphs to include data
movement.
- Graph is optimized to
remove redundant data
transfers.
Runtime
Code Cache Memory
Task Queue
Device
Device Device Device
…
- Runtime schedules tasks on devices.
Heterogeneous Managed Runtime Systems: A Computer Vision Case Study VEE’17, Xi’an, China
C++ - 2.72 FPS
Java - 0.81 FPS
Java/OpenCL
- 33.13 FPS
0
10
20
30
0 500 1000
Frame Number
Frames Per Second
10
1000
Acq. Pre. Tra. Int. Ray. Rend. Total
Pipeline Stage
Speedup Over Java (log10)
C++ Java/OpenCL
preprocessing
mm meters
malloc
VEE’17, Xi’an, China C. Kotselidis et al.
http://dl.acm.org/citation.cfm?id=823453.823860
DOI:
http://dx.doi.org/10.1145/1869459.1869469
DOI:
http://dx.doi.org/10.
1145/1941553.1941562
http://code.google.com/p/
scalacl
DOI:http://dx.doi.org/10.1145/1926354.1926358
DOI:
http://dx.doi.org/10.
1145/1808954.1808959
http://ejml.org
DOI:
http://dx.doi.org/10.1145/2627373.
2627381
Heterogeneous Managed Runtime Systems: A Computer Vision Case Study VEE’17, Xi’an, China
DOI:
http://dx.doi.org/10.1145/2500828.2500840
DOI:
http://dx.doi.org/10.1145/2502323.2502329
DOI:
http://dx.doi.org/10.1145/2509136.2509516
http://www.jocl.org/
DOI:
http://dx.doi.org/10.1145/1863523.1863533
DOI:http://dx.doi.org/10.1109/ISMAR.2011.6092378
DOI:
http://dx.doi.
org/10.1145/2047862.2047883
http://openjdk.java.net/
DOI:http://dx.doi.org/10.1109/HPCC.2012.57
https://www.spec.org/jvm2008/
DOI:http://dx.doi.org/10.1145/2544137.2544157
https://get.google.com/tango/
DOI:
http://dx.doi.org/10.1145/2159430.
2159439