Content uploaded by Moulay A. Akhloufi
Author content
All content in this area was uploaded by Moulay A. Akhloufi on Feb 04, 2015
Content may be subject to copyright.
OpenCLIPP: OpenCL Integrated
Performance Primitives library for computer
vision applications
Moulay Akhloufi, akhloufi@gel.ulaval.ca ; Antoine Campagna
In recent years, we see an increase of interest for GPGPU computing (General-Purpose
computation on Graphics Processing Units). This domain aim to using the processing
power of the GPU (Graphics Processing Units) in order to accelerate general processing
like mathematics, 3D visualization, image processing, etc.
In the past years, CUDA (Compute Unified Device Architecture) a parallel computing
platform and programming model invented by NVIDIA was the main driver of this interest
and the most used architecture for GPGPU computing. With the recent advent of Open
Computing Language (OpenCL), we see more and more work conducted using this new
platform. OpenCL is an open standard maintained by the non-profit technology
consortium Khronos Group. It has been adopted by multiple companies including
NVIDIA (the inventor of CUDA).
With this increase of interest, the availability of a set of performance primitives for
general purpose applications can help accelerate the work of the research and industrial
communities. Intel, for example, develops Intel Integrated Performance Primitives (Intel
IPP), a multi-threaded software library of functions for multimedia and data processing
applications. In the other hand, NVIDIA offers the NVIDIA Performance Primitives library
(NPP), a collection of GPU-accelerated image, video, and signal processing functions
that deliver faster performance than comparable CPU-only implementations.
In this work, we present the architecture and development of an open source OpenCL
integrated performance primitives library called OpenCLIPP. This library aim to provide a
free and open source set of OpenCL functions with a simple interface similar to Intel IPP
and NVIDIA NPP. The first release includes mainly image processing and computer
vision algorithms: Convolution filters, Thresholding, Blobs, etc. The developed functions
are introduced and benchmarks with equivalent Intel IPP and NVIDIA NPP functions are
presented. This library will be made available to the open source community.
M. Akhloufi, A. Campagna, "OpenCLIPP: OpenCL Integrated Performance Primitives library for computer vision
applications", Proc. SPIE Electronic Imaging, Intelligent Robots and Computer Vision XXXI: Algorithms and Tech-
niques, 9025-31, San Francisco, CA, USA, February 2014
Performance results
OpenCLIPP can provide a significant performance improvement to all image processing
applications, regardless of the platform used (AMD or NVIDIA, Windows or Linux).
Performance gain is substantial when compared to even the most optimized CPU libraries
when processing large (>10 MPixels) images on high end GPUs.
GPU processing is not a good choice for small images (<1 MPixels) due to the overhead.
This library was made Open Source so that interested programmers ca use it free in their
applications and contribute to improve it: http://openclipp.wix.com/openclipp
Computer vision is more and more used in today's applications.
With always higher resolution and more demanding algorithms, applications are often limited by the processing power of CPUs.
An alternative is the use of GPUs.
We present a new library based on OpenCL to perform high speed image processing on GPUs: OpenCLIPP
The library is Open Source, LGPL licensed and free for commercial use. You can download it on GitHub website:
http://openclipp.wix.com/openclipp
OpenCL is a framework that allows using the computing resources present in specialized
computing devices like GPUs.
How it works :
1. A program is written in a language similar to C
2. The program gets compiled for the computing device used
3. The compiled program runs in parallel over all the computing resources of the device
What is OpenCL ?
The library provides an interface in C, allowing many programming languages to use its capabilities.
// Variables
ocipContext Context =NULL;
ocipImage SourceImage,ResultImage;
SImage ImageInfo = {...}; // Fill with size, type, channels of image
// Initialize OpenCL
ocipInitialize(&Context,NULL,CL_DEVICE_TYPE_ALL);
ocipSetCLFilesPath("/path/to/cl files/");
// Create images in OpenCL device
ocipCreateImage(&SourceImage,ImageInfo,SourceImageData,CL_MEM_READ_ONLY);
ocipCreateImage(&ResultImage,ImageInfo,ResultImageData,CL_MEM_WRITE_ONLY);
// Prepare the Filters - compiles the OpenCL C program
// optional (would otherwise be done upon the first filter call)
ocipPrepareFilters(SourceImage);
// Apply filter (asynchronous)
ocipSobel(SourceImage,ResultImage);
// Transfer image to host (synchronous)
ocipReadImage(ResultImage);
How to use in C
There are two existing and popular image processing primitives libraries :
•Intel IPP optimized for CPUs
•NVIDIA NPP, which provides a similar interface to Intel IPP but allows computing on NVIDIA CUDA
GPUs
OpenCLIPP provides an interface in C inspired by the interface in these libraries but simplified.
OpenCLIPP also provides a C++ interface.
The library supports images with :
•signed and unsigned integer of 8, 16 or 32 bits, or floating point 32 bits
•1, 2, 3 or 4 channels
•almost any size (maximum image size depends on hardware)
Library interface
How it works
Conclusion
Introduction
by
Moulay Akhloufi, (akhloufi@gel.ulaval.ca)
Antoine W. Campagna
The library comes with a test and benchmarking program.
The results below have been obtained with a PC with the following specifications:
•Intel Core i7-3770 8GB RAM
•NVIDIA GeForce GTX 680
•Windows 7 64b
Each primitive was run 30 times, the average of all runs is given.
Image transfer and program compilation times are not included in the results.
Here we see the performance advantage of GPUs with OpenCLIPP performing up to 8 times faster than IPP for
calculating the absolute difference between two images. We can also see OpenCLIPP beats NPP by a small margin.
And here is the same results along a logarithmic scale to better see the performance on small images.
We can see that GPU operations have an overhead.
The overhead for NPP is 0.01ms and the overhead for OpenCLIPP is higher at 0.03ms.
OpenCV OCL has a even higher overhead at 0.11ms
CPU has no such overhead so IPP beats GPU for small images.
AbsDiff is a very simple algorithm. Below, we show a more complex algorithm
TopHat morphological operation, which has many memory accesses for each pixel
We can see OpenCLIPP has a 2X lead over IPP here and a slight lead over NPP
And here is a statistical reduction, presented in GB/s
Right now, there are two major frameworks for GPU computing : OpenCL and CUDA.
CUDA has its advantages but CUDA works only on NVIDIA devices, while OpenCL works
on all major high performance devices.
In our experiments, we found that OpenCL is as fast as CUDA on NVIDIA hardware.
OpenCL may also become prevalent on mobile devices (where GPUs are increasingly
powerful). This will increase the range of OpenCL applications.
Why OpenCL ?
The library itself is implemented in C++ and C++ programs can use the C++ interface directly.
using namespace OpenCLIPP;
SImage ImageInfo = {...}; // Fill with size, type, channels of image
// Initialize OpenCL
COpenCL CL;
CL.SetClFilesPath("/path/to/cl files/");
Filters filters(CL);
// Create images in OpenCL device
ColorImage SourceImage(CL,ImageInfo,SourceData);
ColorImage ResultImage(CL,ImageInfo,ResultData);
// Prepare the Filters - compiles the OpenCL C program
// It is optional (would otherwise be done upon the first filter call)
filters.PrepareFor(SourceImage);
// Apply filter (asynchronous)
filters.Sobel(SourceImage,ResultImage);
// Transfer image to host (synchronous)
ResultImage.Read(true);
How to use in C++
RAM
CPU VRAM
OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.
GPU
0,0001
0,001
0,01
0,1
1
10
Time in ms - lower is better
Image size
AbsDiff U8 - log scale
CPU (IPP) OpenCLIPP NPP OpenCV OCL
0
1
2
3
4
5
6
7
8
512x512 1024x1024 2048x2048 HXGA 4096x4096 HSXGA HUXGA WHUXGA
Time in ms - lower is better
Image size
AbsDiff U8
CPU (IPP) OpenCLIPP NPP OpenCV OCL
0
5
10
15
20
25
512x512 1024x1024 2048x2048 HXGA 4096x4096 HSXGA HUXGA WHUXGA
Time in ms - lower is better
Image size
TopHat U8
CPU (IPP) OpenCLIPP NPP OpenCV OCL
0
20
40
60
80
100
120
140
Processing bandwidth in GB/s - higher is better
Image size
Processing bandwidth for Mean Reduction - F32
CPU (IPP) OpenCLIPP NPP
Here we see a good 40GB/s for CPU when inside the cache and 15GB/s for images too big for the
cache.
Performance of OpenCLIPP increases with the size of the image, reaching 135GB/s, 9X faster than
IPP and 50% faster than NPP. OpenCV OCL failed to calculate the mean in current version.
Arithmetic Add AddSquare Sub AbsDiff Mul Div Min Max
AddC SubC AbsDiffC MulC DivC RevDivC MinC MaxC
Abs Exp Log Sqr Sqrt Sin Cos
Logic And Or Xor AndC OrC XorC Not
LUT LUT, Linear LUT, Scale LUT
Morphology Erode Dilate Open Close Gradient TopHat BlackHat
Transform MirrorX MirrorY Flip Transpose Resize SetAll
Conversions Convert Scale Copy ToGray SelectChannel ToColor
Tresholding TresholdGT TresholdLT TresholdGTLT Compare
Filters Gauss Sharpen Smooth Median Sobel Prewitt Scharr HiPass Laplace
Reductions Min Max MinAbs MaxAbs Sum Mean MeanSqr
More functions Histogram, Integral scan, Blob labeling and FFT (soon)
Supported primitives in version 1
HXGA-4096x3072, HSXGA-5120x4096, HUXGA-6400x4800, WHUXGA-7680x4800