Tool Description: Array programming in Pascal
Paul Cockshott, Ciaran Mcreesh, Susanne
University of Glasgow, School of Computing Science
University of Tripoli
A review of previous array Pascals leads on to a description the
Glasgow Pascal compiler. The compiler is an ISO-Pascal superset
with semantic extensions to translate data parallel statements to
run on multiple SIMD cores. An appendix is given which includes
demonstrations of the tool.
Keywords Pascal, SIMD, Vector Processor, GPU
1. Previous array Pascals
] was one of the ﬁrst imperative programming lan-
guages to be provided with array extensions. The ﬁrst Array Pascal
] was roughly contemporary with the compa-
rable Distributed Array Processor Fortran[15, 25].
Turner’s Vector Pascal[
], another array extension of the lan-
guage, was strongly inﬂuenced by APL[
]. It was similar in its
array features to ZPL[
] or Single Assignment
]. These all developed to address the challenge of the super-
computers that were coming into use at the time. Later Vector Pascal
implementations were developed at Saarland University[
the University of Glasgow[
] an extension for
scientiﬁc data processing provided extensions for vectors, matrices
and interval arithmetic but was not a general array language.
In Actus was the syntax of array declarations indicated which
dimensions of the array were to be evaluated in parallel.
Here the : rather than the .. is used to indicate that the dimension is
to be evaluated in parallel. Actus provided both parallel assignments
using index sets
and parallel compound statements using the con-
The implicit assumption behind this design decision appears to
have been that there would be distributed processors each with their
own memory banks, so that the compiler would spread the array
over the banks using the
index form as a clue. This idea has
not been used in subsequent Vector Pascal dialects which have been
designed for machines with a uniﬁed memory.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for proﬁt or commercial advantage and that copies bear this notice and the full citation
on the ﬁrst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior speciﬁc permission and/or a
fee. Request permissions from email@example.com.
PLDI 2015, .
2015 ACM 978-1-4503-3584-3/15/06. . . $15.00.
2. Glasgow Vector Pascal
In what follows ’Vector Pascal’ will refer to the Glasgow Vector
Pascal compiler. The implementation initially targeted modern
] for which it used vectorisation techniques
similar to those in the contemporary Intel C compiler[
]. With the
advent of multi-core machines and GPU’s subsequent Vector Pascal
releases have supported automatic multi-core parallelism as well as
Vector Pascal uses implicit parallelism obviating the need for
a statement. Conventional loops will, in fact, be
vectorised if there are no data dependencies, but the spirit of the
language is to use APL style array expressions. Thus one can write:
to operate on all corresponding elements of the three arrays. This is
semantically equivalent to:
The index vector is implicitly declared with sufﬁcient elements
to index the array on the left of the assignment scope covering the
right of the assignment statement. Index vectors are usually elided
provided that corresponding positions in arrays are intended.
can be explicitly used to perform things like circular shifts :
Let us assume that we want to compile in program to
execute on a 6 core Xeon using the AVX instruction-set and 32 bit
addressing, we use the command
The compiler then transforms the code into:
The statement has been broken down into two forms of parallelism:
an outer loop that runs on different cores doing every 6th row and an
inner loop that operates 8 words at a time. The loops are then placed
in a nested procedure. The threads on the different cores have
access to the variables by virtue of Pascal being a block
structured language, but have local copies of . Access to the
enclosing scope by the other cores is ensured by sending a static
link from register when posting the job.
Vector Pascal does not support parallel statements, but does
allow parallel expressions:
2.2 Map and reduce
Any dyadic operator
can be used as a reduction operator using the
, so \* computes the product of a vector, \+ its sum
Function applications map over arrays. The following example
uses map and reduce.
returns the scalar added to the product of the elements of . It
is mapped over as follows
If you have a matrix, transposing the matrix amounts to swapping
the order of the row and column indices. Thus for matrix
will swap the indices of the right hand side. If is a matrix this is
equivalent to :
∀i, j ∈a
. But if were a vector
it is equivalent to :∀i, j ∈a.
Generalisation of transpose is provided by the operator
which permutes the indices
is equivalent to :∀i, j, k ∈p.
The . operator between arrays performs the scalar product thus:
is equivalent to the sum of products
. To the extent
that + and are overloaded so is scalar product. Thus when are
vectors of sets this evaluates as pairwise set intersection reduced by
Table 1. Compliance with ISO standard tests.
Compiler Failed % Success
Free Pascal 2.6.2 34 80
Turbo Pascal 7 26 87
Vector Pascal Pentium 0 100
Vector Pascal Xeon Phi 4 97.6
The Pascal standard[
] supports sets over cardinal types. Vector
Pascal extends this to any ordered type.
For cardinal element types sets are implemented as bitmaps and set
expressions vectorised using SIMD. Many Pascal implementations
have a maximum set size of
elements. Vector Pascal supports
sets of up to
elements. For non-cardinal element types the sets
are implemented as balanced trees and no vectorisation is used.
Dynamic sized arrays are supported as described by the Extended
is now a pointer to a vector of 10 reals.
Sub-array expressions of the form return a dynamic
array with bounds
] is supported by the compiler. The
ﬂag on the compiler command line causes a L
ﬁle to be created. Comments delimited thus are treated as
X, those delimited as are rendered as marginal notes.
Pascal code is reformatted to typographically distinguish reserved
words and variable names. Formulae are rendered with appropriate
Source code can be in UTF8 Unicode, and variable names can
be in Roman, Greek, Cyrillic or CJK characters. Chinese equivalent
reserved words are supported.
The compiler is in Java and is released from SourceForge under GPL.
It uses the toolchain for linking and targets a range of contempo-
rary and recent instruction-sets: Pentium, Opteron[
], SSE, SSE2,
AVX, Playstation2(MIPS), Playstation3(Cell)[
], Nvida and the
Intel Knights Ferry[
]. The relevant assemblers must, of course,
be installed. In addition non supported architectures can be targeted
by the option which translates Vector Pascal to C and uses
to generate binary. For the Cell and Nvidia implementations,
the compiler generates code for an abstract SIMD machine that is
implemented either in C on the vector processors or in CUDA on
Performance achieved on Intel AVX and SSE architectures is
comparable to the use of C with Vector Intrinsics and threaded
]. However when compared to GPUs performance
it is not as performant as Cuda. Though vector pascal source code
tends to be more compact than C or Cuda for the same task.
Compliance with the ISO language standard is above that of
some other leading Pascal compilers, see Table 1. The ISO-Pascal
conformance test suite comprises 218 programmes designed to test
each feature of the language standard. From the ISO test set a
was excluded that tests obsolete ﬁle i/o features as all three
compilers follow the Turbo Pascal syntax for ﬁle operations. We
ran the test suite using the host Vector Pascal compiler and in cross
compiler mode for the XeonPhi. A programme was counted as a
pass if it compiled and printed the correct result. A fail was recorded
if compilation did not succeed or the programme, on execution,
failed to deliver the correct result.
4. Future work
We have a number ongoing student projects both to extend the Pascal
system, and to add new front ends to it.
We are extending parallel reduction operations in Pascal to allow
arbitrary dyadic functions, as opposed to operators to be used
We are building a front end for the Haggis language used for
teaching in Scottish schools, that uses the code generator sub-
systems used in the Pascal compiler.
We have a prototype Vector C front end for the compiler. This
supports similar parallelisation mechanisms to Vector Pascal
using a Matlab style array syntax. For example:
when compiled and executed produces as output:
here stands for Glasgow C Compiler. This prototype is not
yet fully conformant with the C standard.
Aart J. C. Bik, Milind Girkar, Paul M. Grey, and Xinmin Tian. Au-
tomatic intra-register vectorization for the Intel architecture. Int. J.
Parallel Program., 30(2):65–98, 2002.
Bradford L Chamberlain, Sung-Eun Choi, C Lewis, Calvin Lin,
Lawrence Snyder, and W Derrick Weathersby. Zpl: A machine in-
dependent programming language for parallel computers. Software
Engineering, IEEE Transactions on, 26(3):197–211, 2000.
P Cockshott, Y Gdura, and Paul Keir. Array languages and the n-body
problem. Concurrency and Computation: Practice and Experience,
Paul Cockshott. Vector pascal reference manual. SIGPLAN Not.,
Paul Cockshott and Greg Michaelson. Orthogonal parallel processing in
vector pascal. Computer Languages, Systems & Structures, 32(1):2–41,
Tests 1,3,5, 19, 54, 67..76,78,90..92, 111..115, 118, 121, 131, 141, 160,
197, 198, 202, 203, 212, 213.
William Paul Cockshott, Susanne Oehler, and Tian Xu. Developing
a compiler for the XeonPhi (TR-2014-341). University of Glasgow,
W.P. Cockshott and A. Koliousis. The SCC and the SICSA multi-core
challenge. In 4th MARC Symposium, December 2011.
Peter Cooper. Porting the Vector Pascal Compiler to the Playstation 2.
Master’s thesis, University of Glasgow Dept of Computing Science,
AK Ewing, H Richardson, AD Simpson, and R Kulkarni. Writing
Data Parallel Programs with High Performance Fortran. Edinburgh
ParallelComputing Centre, 1998.
A. Formella, A. Obe, WJ Paul, T. Rauber, and D. Schmidt. The SPARK
2.0 system-a special purpose vector processor with a VectorPASCAL
compiler. In System Sciences, 1992. Proceedings of the Twenty-Fifth
Hawaii International Conference on, volume 1, pages 547–558. IEEE,
Youssef Omran Gdura. A new parallelisation technique for heteroge-
neous CPUs. PhD thesis, University of Glasgow, 2012.
C. Grelck and S.-B. Scholz. SAC — From High-level Programming
with Arrays to Efﬁcient Parallel Execution. Parallel Processing Letters,
R Hammer, M Neaga, and D Ratz. Pascal xsc. New Concepts for
Scientiﬁc Computation and Numerical Data Processing, pages 15–44,
Tony Hetherington. An introduction to the extended pascal language.
ACM SIGPLAN Notices, 28(11):42–51, 1993.
DAP ICL. Fortran language reference manual. ICL Technical Publica-
tion TP6918, 1979.
Intel Corporation. Intel Xeon Phi Product Family: Product Brief, April
 ISO. Pascal ISO 7185, 1990.
 K. Iverson. A programming language. Wiley, New York, 1966.
Iain Jackson. Opteron Support for Vector Pascal. Final year thesis,
Dept Computing Science, University of Glasgow, 2004.
Kathleen Jensen, Niklaus Wirth, Andrew B Mickel, and James F Miner.
Pascal: user manual and report, volume 3. springer-Verlag New York,
Christoph W Kessler, Wolfgang J Paul, and Thomas Rauber. Scheduling
vector straight line code on vector processors. In Code Generation
Concepts, Tools, Techniques, page 73..91. Springer, 1992.
Donald Ervin Knuth. Literate programming. The Computer Journal,
Calvin Lin and Lawrence Snyder. Zpl: An array sublanguage. In
Languages and Compilers for Parallel Computing, pages 96–114.
R. H. Perrott. A Language for Array and Vector Processors. ACM
Trans. Program. Lang. Syst., 1(2):177–195, October 1979.
R. H. Perrott and A. Zarea-Aliabadi. Supercomputer languages. ACM
Comput. Surv., 18(1):5–22, 1986.
S.-B. Scholz. —Efﬁcient Support for High-Level Array Operations in
a Functional Setting. Journal of Functional Programming, 13(6):1005–
L Snyder. A Programmer’s Guide to ZPL. MIT Press, Cambridge,
T Turner. Vector Pascal a Computer Programming Language for the
Array Processor. PhD thesis, PhD thesis, Iowa State University, USA,
Here is a scaled up version of the programme described earlier
It performs 2*800*1024*100= 163 million arithmetic operations,
we can compile it for the default Pentium code model and produce
X listing ﬁle thus:
Running it on an AMD A6 we get
We can now compile it for the AVX instruction-set
This vectorises the code so it runs much faster
It can be further accelerated by multicore compilation. Note it is
not worth using more than 2 cores on this model of CPU as there
are only 2 vector ﬂoating point units shared between the 4 cores.
Although on programmes as small as this gains from parallelism are
not guaranteed. We get the following code for the inner loop:
Now let us look at the listings,
Or we can run on the ﬁle and get a pretty print
version looking like this
(see Section 4.2 )
← × ;
Next let us compare the performance of Vector Pascal with C
when blurring a 1024x1024 pixel colour image. The same separable
convolution algorithm is used in both cases:. The addition of the C
ﬁle on the compiler command line instructs it to link the Pascal and
C in a single binary.
Pascal outperforms C in this example because it uses saturated
SIMD arithmetic on pixels.
Finally as a bit of fun, matrix product of numbers and strings to
print a Roman number:
Thanks to the many Glasgow University students whose term
projects contributed to the compiler and to CloPeMa, Collaborative
project funded by the EU FP7-ICT , 288553