Technical ReportPDF Available

Raspberry Pi 4B Stress Tests Including High Performance Linpack

Authors:
  • UK Government

Abstract

This report contains details and results of the programs used for stress testing Raspberry Pi 4B. They cover multi core CPU integer and floating point tests with data covering caches and RAM, Input/Output exercisers for the main, USB and network connected drives and graphics activity via OpenGL. The programs used, or new test functions, are ones that are known to have caused errors, system failures or overheating issuers on earlier Raspberry Pi systems and PCs. In this case, no error conditions were indicated, but there were severe overheating problems that gave rise to slow performance, when stressing multiple cores. On the positive side, it is shown that these problems can be avoided by adequate cooling and this need not be expensive.
Raspberry Pi 4B Stress Tests Including High Performance Linpack
Roy Longbottom
Contents
Introduction
General Integer Test Floating Point Tests
Environment Monitors OpenGL Test Livermore Loops Test
Input/Output Test High Performance Linpack
Test Results
Unstressed Tests Single/Multi Core CPU Tests OpenGL Test No Cooling
Integer Stress Tests SP Floating Point Stress Tests DP Floating Point Stress Tests
High Performance Linpack Tests Livermore Loops/OpenGL Tests Loops/OpenGL Dual Monitors
Input/Output Stress Tests
Summary
These stress tests are a continuation of activity covered in Raspberry-Pi-4-Benchmarks.pdf at ResearchGate.
This report contains details and results of the programs used for stress testing Raspberry Pi 4B. They cover multi core CPU
integer and floating point tests with data covering caches and RAM, Input/Output exercisers for the main, USB and
network connected drives and graphics activity via OpenGL. The programs used, or new test functions, are ones that are
known to have caused errors, system failures or overheating issuers on earlier Raspberry Pi systems and PCs. Those
reported here represent a small sample of the tests carried out.
When running the stress tests, the environment was monitored using system utilities and a program that measures CPU
MHz, voltage and temperature. The specially written stress tests mainly provide, display and log average ongoing
measurements of performance over sections of the testing period, more useful than a report at the end.
An important observation is that the processor runs at full speed 1500 MHz, until the temperature reaches 80°C, when
throttling starts, firstly at 1000 MHz, then 750, 600 and lower. The first tests are intended to show that, using a single
core, the CPU can run at full speed with the temperatures less than 75°C (room up to 25°C). This applied to a one hour
OpenGL session and a five minute integer stress test, where four cores started throttling after less than a minute.
4 Core Integer Tests - These were run for 15 minutes without any cooling, with a copper heatsink, using an official Power
Over Ethernet Hat/Fan and an inexpensive case/fan. The first started throttling after a minute, reaching 85°C, finally
running at 56% of initial speed. The second was slightly better at 63%. Both with fans ran at full speed at up to 61°C and
66°C respectively.
4 Core Floating Point Tests - These included runs with no cooling and using the case/fan enclosure, covering L1 cache
and L2 cache based data, with the same thermal behaviour as the integer tests. The single precision version ran
continuously at more than 20 GFLOPS with the fan, reducing slowly to 10.9 GFLOPS with no cooling. The double precision
version obtained up to 10.7 GFLOPS, down to 5.2 GFLOPS.
High Performance Linpack - This double precision benchmark was included as it lead to wrong results and all sorts of
other failures using the original Pi 3. The version using ATLAS, with the alternative Basic Linear Algebra Subprograms, was
built on the Pi 4 and run with increasing data array sizes up to 20000 x 20000 words, occupying over 3 GB. All ran
successfully with and without cooling. At the largest size, the fan cooled setup obtained 10.8 GFLOPS at up to 71°C in just
over 8 minutes. The one without cooling was completed in over 14 minutes, down to 6.2 GFLOPS reaching 87°C.
Livermore Loops and OpenGL Benchmarks - Three copies of the former were run along with the most demanding test
function from the latter at 1920 x 1080 pixels. The CPU program executes 24 different double precision calculations using
three different data sizes for a minimum of nearly 15 minutes. With the case/fan combination all ran at near full speed at
an average of 60°C. There was an overall reduction in performance of up to 40% with no cooling, with temperature up to
85°C. The latter was repeated using dual monitors, effectively at 3840 x 1080 pixels, with all programs running somewhat
slower.
Input/Output Stress Tests - The main I/O program writes four 164 MB files comprising numerous different hexadecimal
data patterns, reads them on a random selection basis, for a specified time, then repetitively reads each 64 KB data block
numerous times. Three copies were run for nearly 15 minutes, accessing the main drive, a USB 3 stick and a remote PC via
a 1 Gbps LAN, at the same time, along with a copy of the 4 threaded integer testing program, with no cooling attachment.
There was the usual throttling at temperatures up to 86°C, with CPU test starting at 58% of possible maximum, reducing
to 44%. The LAN test appeared to run continuously at over 32 MB/second, the main drive at 85% of maximum expectation
and the USB 3 drive slower at down to 64%. No data comparison failures were detected in handling all that data.
Introduction below or Go To Start
Introduction
My original Raspberry Pi Stress Tests were reported in September 2017 for up to Pi 3B, then September 2018 with Pi 3B+
and later March 2019 for Pi 3B and 3B+, including High Performance Linpack errors
There were two versions, one using single precision floating point multiply and add calculations and the other with integer
add and subtract instructions. The tests comprised running multiple copies of the programs, in different terminal windows,
along with another program that measures CPU MHz, voltage and temperature. The latter tests were carried out, following
earlier reports that the Linpack High Performance Benchmark could produce the wrong numeric results, or cause a system
crash, using the older Raspberry Pi 3B. The stress tests could reproduce the same sort of failures, using the Pi 3B but, as
with HPL, not running on the Pi 3B+.
Analysing results of the earlier tests was complicated when Operating Systems did not assign resources evenly between
programs and tests would become more restrictive with CPUs containing more than four cores.
On producing benchmarks to run on Android based systems, the multiple program approach to stress testing was not really
applicable. So, I produced Android App version with single programs using multiple threads. I have now converted these to
run on Raspberry Pi systems. This March 2018 report. provides details of the latest Android tests. These include MP
behaviour and performance of more advanced ARM processors, up to 8 core Cortex-A73.
The new Respberry Pi Versions are initially available as 32 bit programs, comprising tests using single precision floating
point, double precision floating point and integer calculations. For these programs, each thread uses dedicated segments of
memory based data. There is also a slightly modified MHz/volts/temperature application.
The older OpenGL program is also considered here. All these have run time options to select a particular test function and
running time. The CPU tests can be run in a benchmarking mode, the default without any command line parameters, to
indicate which function is most appropriate and, at run time, the data size can be specified.
These stress tests are a continuation of activity covered in Raspberry-Pi-4-Benchmarks.pdf at ResearchGate, with
programs and source codes in Raspberry-Pi-4-Benchmarks.tar.gz, that also includes those used for this exercise.
Integer test next or Go To Start
Integer Stress Test - MP-IntStress
The integer program test loop comprises 32 add or subtract instructions, operating on hexadecimal data patterns, with
sequences of 8 subtracts then 8 adds to restore the original pattern. The benchmarking mode uses 1, 2, 4, 8, 16 and 32
threads, with data sizes 16 KB, 160 KB and 16 MB. Below is the log file from running the 32 bit benchmark on a Raspberry Pi
4B via Raspbian Buster. Disassembly shows that the test loop, in fact, used 68 instructions, most additional ones being load
register type. The result of these is 68/32 instructions per 4 byte word. At the maximum of 1489M words per second, using
a single core, resultant execution speed was 3164 MIPS with nearly four times more using all cores.
MP-Integer-Test 32 Bit v1.0 Fri Jun 21 15:39:57 2019
Benchmark 1, 2, 4, 8, 16 and 32 Threads
MB/second
KB KB MB Same All
Secs Thrds 16 160 16 Sumcheck Tests
4.9 1 5956 5754 3977 00000000 Yes
3.6 2 11861 11429 3763 FFFFFFFF Yes
3.1 4 22998 21799 3464 5A5A5A5A Yes
3.1 8 22695 21128 3490 AAAAAAAA Yes
3.1 16 22835 23491 3485 CCCCCCCC Yes
3.0 32 22593 23485 3591 0F0F0F0F Yes
Stress Testing Mode
The following shows the run time command and available parameters.
./MP_IntStress Threads tt, Minutes mm, KB kk, Log ll
tt = 1, 2, 4, 8, 16, 32
mm = greater than 0
kk = between 12 and 15624
ll = number added to log file name between 0 and 99
Floating Point Stress Tests below or Go To Start
Floating Point Stress Tests - MP-FPUStress, MP-FPUStressDP
The floating point programs use functions containing 2, 8 or 32 multiply and add operations, to exploit the availability of
instructions that can fuse them together for increased performance. The benchmark uses data sizes of 12.8 KB, 128 KB
and 12.8 MB with calculations via 1, 2, 4 and 8 threads. Each word is initialised with the same value of 0.99999 that
calculations slowly reduce, the final one being multiplied by 100000 for a sumcheck. Each word is then checked to confirm
that all results are identical.
Results are provided below, showing that sumchecks vary by data size and operations per word, due to variations in the
number of calculations, but are constant when the thread count is different through executing the same calculations.
Disassembly shows that Double Precision (DP) compilation produced instructions such as vfma.f64 d16, d25, fused multiply
and add, operating in one DP word registers. Information available indicates that this is the best possible performance
option, producing two operation results per clock cycle, 3.0 GFLOPS per core, in this case. Single Precision (SP) code was
vfma.f32 q8, q2, q13 with 4 words in quad registers, where eight results per cycle, might be expected, or 12 GFLOPS per
core. Actual maximum SP speeds look as though they could be about half of that.
MP-Threaded-MFLOPS 32 Bit v1.0 Sun May 26 21:23:49 2019
Benchmark 1, 2, 4 and 8 Threads
MFLOPS Numeric Results
Ops/ KB KB MB KB KB MB
Secs Thrd Word 12.8 128 12.8 12.8 128 12.8
1.6 T1 2 2134 2607 656 40392 76406 99700
2.9 T2 2 5048 5156 621 40392 76406 99700
4.0 T4 2 7536 9939 681 40392 76406 99700
5.2 T8 2 7934 9839 639 40392 76406 99700
7.2 T1 8 5535 5420 2569 54756 85091 99820
8.7 T2 8 10757 10732 2454 54756 85091 99820
10.1 T4 8 18108 20703 2444 54756 85091 99820
11.5 T8 8 19236 20286 2245 54756 85091 99820
17.4 T1 32 5309 5270 5262 35296 66020 99519
20.4 T2 32 10551 10528 9753 35296 66020 99519
22.4 T4 32 20120 20886 11064 35296 66020 99519
24.5 T8 32 19415 20464 9929 35296 66020 99519
MP-Threaded-MFLOPS 32 Bit v1.0 Sun May 26 21:26:37 2019
Double Precision Benchmark 1, 2, 4 and 8 Threads
MFLOPS Numeric Results
Ops/ KB KB MB KB KB MB
Secs Thrd Word 12.8 128 12.8 12.8 128 12.8
3.4 T1 2 921 998 326 40395 76384 99700
6.1 T2 2 1968 1995 308 40395 76384 99700
8.4 T4 2 3465 3925 342 40395 76384 99700
10.9 T8 2 3646 3702 301 40395 76384 99700
15.1 T1 8 2377 2446 1283 54805 85108 99820
18.1 T2 8 4916 4860 1326 54805 85108 99820
20.5 T4 8 9202 9510 1391 54805 85108 99820
23.1 T8 8 9090 9006 1298 54805 85108 99820
34.5 T1 32 2695 2725 2707 35159 66065 99521
40.3 T2 32 5416 5441 5121 35159 66065 99521
44.1 T4 32 10666 10831 5275 35159 66065 99521
48.3 T8 32 10427 10602 4832 35159 66065 99521
Stress Testing Mode
The following shows the run time command and available parameters.
./MP_FPUStress Threads tt, Minutes mm, KB kk, Ops 00, Log ll
or MP_FPUStressDP
tt = 1, 2, 4, 8, 16, 32, 64
mm = greater than 0
kk = between 12 and 15624
ll = number added to log file name between 0 and 99
oo = 2, 8 or 32 operations per word
Environment Monitors below or Go To Start
Environment Monitors - RPiHeatMHzVolts2 vmstat, sar
A new version of RPiHeatMHzVolts2 was produced to incorporate temperature of the Power Measurement Integrated
Circuit (PMIC). The following shows the run time command and available parameters for the program and an example of
logged output. Note that the details are instantaneous samples. This is fine for temperature measurements, that change
relatively slowly, but when CPU temperature reaches a critical level, 80°C in this case, MHz throttling comes into play, and
this can be down and up quite rapidly. My CPU stress test programs repetitively report average performance over a number
of seconds, carrying out the same calculations, providing a better indication of the amount of throttling.
./RPiHeatMHzVolts2 Passes pp, Seconds ss, Log ll
pp = number of passes at ss intervals
ss = sampling intervals
ll = number added to log file name between 0 and 99
Temperature and CPU MHz Measurement
Temperature and CPU MHz Measurement
Start at Sun Jun 30 14:53:16 2019
Using 11 samples at 30 second intervals
Seconds
0.0 ARM MHz=1500, core volt=0.8912V, CPU temp=60.0'C, pmic temp=54.3'C
30.0 ARM MHz=1500, core volt=0.8859V, CPU temp=74.0'C, pmic temp=62.8'C
60.7 ARM MHz=1500, core volt=0.8859V, CPU temp=78.0'C, pmic temp=68.4'C
91.3 ARM MHz=1500, core volt=0.8859V, CPU temp=82.0'C, pmic temp=70.3'C
122.0 ARM MHz=1500, core volt=0.8859V, CPU temp=81.0'C, pmic temp=70.3'C
152.8 ARM MHz=1000, core volt=0.8859V, CPU temp=82.0'C, pmic temp=70.3'C
183.5 ARM MHz=1000, core volt=0.8859V, CPU temp=82.0'C, pmic temp=70.3'C
214.4 ARM MHz=1000, core volt=0.8859V, CPU temp=82.0'C, pmic temp=72.2'C
245.1 ARM MHz=1000, core volt=0.8859V, CPU temp=82.0'C, pmic temp=72.2'C
276.0 ARM MHz=1000, core volt=0.8859V, CPU temp=82.0'C, pmic temp=72.2'C
306.9 ARM MHz=1000, core volt=0.8859V, CPU temp=81.0'C, pmic temp=71.2'C
337.6 ARM MHz=1500, core volt=0.8859V, CPU temp=71.0'C, pmic temp=65.6'C
End at Sun Jun 30 14:58:54 2019
vmstat - This is used when running stress tests, to indicate system utilisation and to confirm speeds
measured by tests. Main columns used are free memory, I/O bytes in and out and user plus system
CPU utilisation, where 25% equals equivalent of 100% of one core.
pi@raspberrypi:~ $ vmstat 10 6 - for 6 measurements at 10 second intervals
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 3654628 20884 196956 0 0 26 1 136 223 3 1 96 0 0
1 0 0 3613900 20888 213328 0 0 0 6 1249 2143 13 2 85 0 0
1 0 0 3612044 20904 214660 0 0 0 12 991 1650 24 3 73 0 0
1 0 0 3609776 20904 216944 0 0 0 3 935 1556 25 2 73 0 0
1 0 0 3604040 20912 222448 0 0 0 12 1025 1653 25 3 73 0 0
1 0 0 3602588 20920 224852 0 0 0 6 946 1548 25 2 73 0 0
sar -n DEV - This utility can be used to measure network traffic after installing Sysstat.
sar -n DEV 30 25 > sar.txt - for 25 measurements over 30 second periods
IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil
Example Write
wlan0 1190.20 2527.47 65.99 3744.17 0.00 0.00 2.93 0.00
Example Read
wlan0 2340.90 1059.03 3378.11 98.89 0.00 0.00 1.60 0.00
OpenGL Stress Tests below or Go To Start
OpenGL Stress Tests - videogl32
The OpenGL benchmark can also be run as a stress test. As a benchmark, it has six tests, the first four portraying moving
up and down a tunnel including various independently moving objects, with and without texturing. The last two tests,
represent a real application for designing kitchens. The first is in wireframe format, drawn with 23,000 straight lines, the
second having colours and textures applied to the surfaces.
The program has options to specify window sizes and to avoid excessive logging for use in a script file, as in the example
below. Starting with export vblank_mode=0, turns off VSYNC, identifying where FPS speeds greater than 60 FPS are
possible. Following is a script file and sample Pi 4 log. Default running time is 5 seconds each test and full screen, where no
sizes are specified. The time can be changes by adding such as Seconds 20 to the commands.
export vblank_mode=0
./videogl32 Width 320, Height 240, NoEnd
./videogl32 Width 640, Height 480, NoHeading, NoEnd
./videogl32 Width 1024, Height 768, NoHeading, NoEnd
./videogl32 NoHeading
###################################################################
GLUT OpenGL Benchmark 32 Bit Version 1, Thu May 2 19:01:05 2019
Running Time Approximately 5 Seconds Each Test
Window Size Coloured Objects Textured Objects WireFrm Texture
Pixels Few All Few All Kitchen Kitchen
Wide High FPS FPS FPS FPS FPS FPS
320 240 766.7 371.4 230.6 130.2 32.5 22.7
640 480 427.3 276.5 206.0 121.8 31.7 22.2
1024 768 193.1 178.8 150.5 110.4 31.9 21.5
1920 1080 81.4 79.4 74.6 68.3 30.8 20.0
Stress Tests
It is more appropriate to produce a script file to run stress tests and to include that
export function. The run command needs a minutes parameter and an optional test,
the default being Test 4 (./videogl32, Minutes 60 would run test 4 for an hour on a
full screen.
Below is an indication of CPU utilisation during the six tests. This is followed by
results of a short stress test, where average speed over each 30 seconds is
reported.
GLUT OpenGL Benchmark 32 Bit Version 1, Mon Jul 1 16:10:02 2019
Running Time Approximately 5 Seconds Each Test
Window Size Coloured Objects Textured Objects WireFrm Texture
Pixels Few All Few All Kitchen Kitchen
Wide High FPS FPS FPS FPS FPS FPS
1920 1080 57.3 56.2 53.4 49.9 30.7 19.9
1 core CP UT 20 28 40 68 104 100
###################################################################
Run Commands - export vblank_mode=0
./videogl32 Test 4, Mins 1, Log 7
OpenGL Reliability Test 32 Bit Version 1, Wed Jul 3 17:28:02 2019
Display 1920 x 1080 All Objects, With Textures, Test for 1 minutes
Test 4 All Objects, With Textures, 30 seconds, 47 FPS
Test 4 All Objects, With Textures, 30 seconds, 46 FPS
End at Wed Jul 3 17:29:03 2019
Livermore Loops Stress below or Go To Start
Livermore Loops Stress Test - liverloopsPiA7R
The Livermore Loops benchmark was converted to act as a stress test, following wrong numeric results being produced on
an overclocked, PC using a Pentium Pro CPU. The Loops comprise 24 double precision floating point kernels, with
performance measurements in terms of Millions of Floating Point Operations Per Second or MFLOPS. The kernel tests are
repeated three times, with different data sizes. By including the running time of each loop converts the benchmark into a
stress test, whereby numeric results of calculations are checked for correctness after each of the numerous passes, with
errors errors being logged, along with performance details. Detailed results are displayed continuously, as the tests are
running. There is too much detail for logging. So, as shown below, the start times of each section are reported.
Below an example command to run each test for approximately 12 seconds and save results in LoopsLog1.txt. Total time
should be around 24 x 3 x 12 = 864 seconds, or longer with CPU MHz throttling. This is followed by an example of results
for a short run.
Run command - ./liverloopsPiA7R Seconds 12 Log 1
#####################################################
Livermore Loops Benchmark vfpv4 32 Bit via C/C++ Wed Jul 3 15:11:50 2019
Reliability test 2 seconds each loop x 24 x 3
Part 1 of 3 start at Wed Jul 3 15:11:50 2019
Part 2 of 3 start at Wed Jul 3 15:12:38 2019
Part 3 of 3 start at Wed Jul 3 15:13:27 2019
Numeric results were as expected
MFLOPS for 24 loops
745.8 955.8 988.7 942.6 209.0 769.8 1194.1 1792.5 1254.6 447.9 213.2 186.3
150.7 349.9 778.3 623.3 734.2 1035.4 322.9 350.0 435.8 352.9 746.1 187.3
Overall Ratings
Maximum Average Geomean Harmean Minimum
1793.5 641.2 520.2 412.7 140.3
End of test Wed Jul 3 15:14:16 2019
Input/Output Stress Test below or Go To Start
Input/Output Stress Test - burnindrive2
This is essentially the same as my program used during hundreds of UK Government and University computer acceptance
trials during the 1970s and 1980s, with some significant achievements. Burnindrive writes four files, using 164 blocks of 64
KB, repeated 16 times (164.0 MB), with each block containing a unique data pattern. The files are then read for two
minutes, on a sort of random sequence, with data and file ID checked for correct values. Then each block (unique pattern)
is read numerous times, over one second, again with checking for correct values. Total time is normally about 5 minutes for
all tests, with default parameters. The data patterns are shown below, followed by run time parameters, then examples of
results provided (see later detailed results).
Patterns
No. Hex No. Hex No. Hex No. Hex No. Hex No. Hex No. Hex
1 0 25 800000 49 3 73 FF 97 FFFFDFFF 121 FFFFEAAA 145 FFFFF0F0
2 1 26 1000000 50 33 74 FF00FF 98 FFFFBFFF 122 FFFFAAAA 146 FFF0F0F0
3 2 27 2000000 51 333 75 1FF 99 FFFF7FFF 123 FFFEAAAA 147 F0F0F0F0
4 4 28 4000000 52 3333 76 3FF 100 FFFEFFFF 124 FFFAAAAA 148 FFFFFFE0
5 8 29 8000000 53 33333 77 7FF 101 FFFDFFFF 125 FFEAAAAA 149 FFFF83E0
6 10 30 10000000 54 333333 78 FFF 102 FFFBFFFF 126 FFAAAAAA 150 FE0F83E0
7 20 31 20000000 55 3333333 79 1FFF 103 FFF7FFFF 127 FEAAAAAA 151 FFFFFFC0
8 40 32 40000000 56 33333333 80 3FFF 104 FFEFFFFF 128 FAAAAAAA 152 FFFC0FC0
9 80 33 1 57 7 81 7FFF 105 FFDFFFFF 129 EAAAAAAA 153 FFFFFF80
10 100 34 5 58 1C7 82 FFFF 106 FFBFFFFF 130 AAAAAAAA 154 FFE03F80
11 200 35 15 59 71C7 83 FFFFFFFF 107 FF7FFFFF 131 FFFFFFFC 155 FFFFFF00
12 400 36 55 60 1C71C7 84 FFFFFFFE 108 FEFFFFFF 132 FFFFFFCC 156 FF00FF00
13 800 37 155 61 71C71C7 85 FFFFFFFD 109 FDFFFFFF 133 FFFFFCCC 157 FFFFFE00
14 1000 38 555 62 F 86 FFFFFFFB 110 FBFFFFFF 134 FFFFCCCC 158 FFFFFC00
15 2000 39 1555 63 F0F 87 FFFFFFF7 111 F7FFFFFF 135 FFFCCCCC 159 FFFFF800
16 4000 40 5555 64 F0F0F 88 FFFFFFEF 112 EFFFFFFF 136 FFCCCCCC 160 FFFFF000
17 8000 41 15555 65 F0F0F0F 89 FFFFFFDF 113 DFFFFFFF 137 FCCCCCCC 161 FFFFE000
18 10000 42 55555 66 1F 90 FFFFFFBF 114 BFFFFFFF 138 CCCCCCCC 162 FFFFC000
19 20000 43 155555 67 7C1F 91 FFFFFF7F 115 FFFFFFFE 139 FFFFFFF8 163 FFFF8000
20 40000 44 555555 68 1F07C1F 92 FFFFFEFF 116 FFFFFFFA 140 FFFFFE38 164 FFFF0000
21 80000 45 1555555 69 3F 93 FFFFFDFF 117 FFFFFFEA 141 FFFF8E38
22 100000 46 5555555 70 3F03F 94 FFFFFBFF 118 FFFFFFAA 142 FFE38E38
23 200000 47 15555555 71 7F 95 FFFFF7FF 119 FFFFFEAA 143 F8E38E38
24 400000 48 55555555 72 1FC07F 96 FFFFEFFF 120 FFFFFAAA 144 FFFFFFF0
Sequences - First 16
No. File No. File No. File No. File
1 0 1 2 3 5 0 2 1 3 9 0 3 1 2 13 0 1 2 3
2 1 2 3 0 6 1 3 2 0 10 1 0 3 2 14 1 2 3 0
3 2 3 0 1 7 2 0 1 3 11 2 1 0 3 15 2 3 0 1
4 3 0 2 1 8 3 1 2 0 12 3 2 1 0 16 3 0 2 1
###########################################################################
Run Time Parameters - Upper or Lower Case
Default
R or Repeats Data size, multiplier of 10.25 MB, more or less 16
P or Patterns Number of patterns for smaller files < 164 164
M or Minutes Large file reading time 2
L or Log Log file name extension 0 to 99 0
S or Seconds Time to read each block, last section 1
F or FilePath For other than SD card or SD card directory
C or CacheData Omit O_DIRECT on opening files to allow caching No
O or OutputPatterns Log patterns and file sequences used as above No
D or DontRunReadTests Or only run write tests No
Format ./burnindrive2 Repeats 16, Minutes 2, Log 0, Seconds 1
or ./burnindrive2 R 16, M 2, L 0, S 1
###########################################################################
Examples of Results
File 1 164.00 MB written in 12.79 seconds
File 2 164.00 MB written in 11.93 seconds
Read passes 1 x 4 Files x 164.00 MB in 0.31 minutes
Read passes 2 x 4 Files x 164.00 MB in 0.63 minutes
Passes in 1 second(s) for each of 164 blocks of 64KB:
580 580 580 580 580 580 580 580 580 580 580
580 580 580 580 580 580 580 580 580 580 580
High Performance Linpack Benchmark below or Go To Start
High Performance Linpack Benchmark - xhpl
In 1993, it was found that a precompiled version of High Performance Linpack (HPL) could produce the wrong and
inconsistent numeric calculations, also system crashes. For more information see this PDF file at ResearchGate. This report
includes behaviour of another version, compiled to use ATLAS, using alternative Basic Linear Algebra Subprograms. This
took 14 hours to build, and was slower than the earlier one, but still produced the same failures. As indicated in the report,
my stress tests could be arranged to produce similar problems. There were no sumcheck failures or system crashes, using
the Pi 3B+.
The original precompiled version would not run on the Pi 4 but I rebuilt ATLAS on the new system, this time taking 8 hours.
an example of the output for a quick test is shown below:
HPLinpack 2.2 -- High-Performance Linpack benchmark -- February 24, 2016
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 1000
NB : 128
PMAP : Row-major process mapping
P : 2
Q : 2
PFACT : Right
NBMIN : 4
NDIV : 2
RFACT : Crout
BCAST : 1ringM
DEPTH : 1
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
--------------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR11C2R4 1000 128 2 2 0.17 4.048e+00
HPL_pdgesv() start time Sun May 26 08:44:56 2019
HPL_pdgesv() end time Sun May 26 08:44:56 2019
------------------------------------------------------------------------========
Unstressed Tests below or Go To Start
Unstressed Tests
It is quite easy to produce programs that run at high speeds on all cores of a modern computer, be it a PC, tablet, phone
or a small board system like the Raspberry Pi. These programs are likely to lead to increased CPU temperatures. Given
insufficient cooling arrangements, the systems are likely to continuously reduce CPU MHz (throttling) in order to continue
operation, and eventually power down. Before examining the results of stress testing, it is useful to consider what can be
run without throttling occurring, in this case, on a Raspberry Pi 4, without any cooling.
Single Core and Multi-Core CPU Tests
Below are various results from running five minute MP-Integer-Tests on a Raspberry Pi 4B, out of the case, with no cooling
attachment. As indicated earlier, ongoing speed measurements by a benchmark provides a better understanding of
behaviour, than samples of CPU MHz, that can vary rapidly.
Starting and ending recorded temperatures are shown, along with time when and if 80°C was reached, when throttling will
start. The first column is for a run using a single thread, where CPU MHz, and effectively measured speeds, were constant
over the whole period. The second column provides details when using four threads, with data in L1 caches. The next two
made use of data in L2 cache, starting throttling after one minute, worse than the L1 results, but starting at a higher
temperature. The last column provides results when data was in RAM and running at full speed for over four aand a half
minutes.
MB/second
Cache/RAM L2 L1 L2 L2 RAM
KB 512 64 640 1536 15624
Threads 1 4 4 8 4
Start 62 60 62 64 61
10 5718 23631 22628 20177 3445
20 5717 23603 22634 18329 3443
30 5640 23416 22670 18756 3405
40 5735 23613 22045 17737 3440
50 5740 23618 22636 18456 3444
60 5652 23244 22069 19059 3410
70 5707 23483 19864 17648 3437
80 5736 23360 18639 16017 3445
90 5683 21552 17986 16654 3447
100 5695 20867 17383 14864 3395
110 5719 20218 16475 14805 3437
120 5672 19017 16207 15128 3443
130 5727 18871 15165 13328 3401
140 5735 18888 14773 12638 3437
150 5732 18460 14979 12780 3443
160 5677 17799 14780 13086 3440
170 5719 17976 14313 13221 3404
180 5711 18005 14391 12618 3443
190 5650 17745 14018 12185 3440
200 5738 17312 14120 13267 3397
210 5709 17241 14062 11916 3442
220 5678 17124 14004 11866 3441
230 5719 17392 13467 12018 3397
240 5720 16990 13728 11825 3440
250 5651 17289 13372 12011 3434
260 5714 17135 13683 11596 3442
270 5717 16891 13584 11481 3398
280 5657 16505 13055 11781 3442
290 5725 17049 13396 11550 3445
300 5713 16578 12957 11666 3402
Max 5740 23631 22670 20177 3447
Min 5640 16505 12957 11481 3395
% 98 70 57 57 98
Max C 72 82 84 85 80
Time 80°C N/A 90 60 60 280
OpenGL Test below or Go To Start
OpenGL Test No Cooling
Earlier, I connected the Pi 4 system to BBC iPlayer, via WiFi, and displayed programmes for more than two hours on a full
screen 1920x1080 display (not a hot day). With CPU utilisation around 100% of one core, maximum temperature was 70°C,
with CPU at 1500 MHz all the time.
For this exercise, I ran the OpenGL Textured Kitchen test for an hour, with a full screen display (hotter day than above).
Following is a summary of recorded results by the program, the environmental monitor and vmstat. The program ran at 22
FPS over the whole period, with CPU at a constant 1500 MHz, recording slightly more than 100% utilisation of one core,
with maximum temperature reaching 73°C.
=------ Monitors ------ --------- vmstat --------- Video
gl32
°C °C
Seconds MHz Volts CPU PMIC free User System Idle FPS
0 1500 0.8894 61 54 3589900 0 0 100
120 1500 0.8841 69 59 3523336 25 2 73 22
240 1500 0.8841 71 62 3520464 25 2 73 22
360 1500 0.8841 71 63 3522848 25 2 73 22
480 1500 0.8841 73 63 3522292 25 2 73 22
600 1500 0.8841 72 63 3522284 25 2 73 22
720 1500 0.8841 72 63 3521780 24 2 74 22
840 1500 0.8841 73 63 3520640 25 2 73 22
960 1500 0.8841 72 63 3520884 25 2 73 22
1080 1500 0.8841 72 63 3520140 25 2 73 22
1200 1500 0.8841 73 63 3519864 24 2 73 22
1320 1500 0.8841 73 63 3519892 25 2 73 22
1440 1500 0.8841 73 63 3519892 25 2 73 22
1560 1500 0.8841 73 63 3518880 25 2 73 22
1680 1500 0.8841 72 63 3519264 25 2 73 22
1800 1500 0.8841 73 63 3517976 25 2 73 22
1920 1500 0.8841 73 63 3518616 25 2 73 22
2040 1500 0.8841 72 63 3517984 25 2 73 22
2160 1500 0.8841 72 63 3518604 24 2 73 22
2280 1500 0.8841 73 63 3518496 25 2 73 22
2400 1500 0.8841 73 63 3518868 25 2 73 22
2520 1500 0.8841 72 63 3518488 25 2 73 22
2640 1500 0.8841 73 63 3518212 25 2 73 22
2760 1500 0.8841 73 63 3520008 25 2 73 22
2880 1500 0.8841 73 63 3519756 25 2 73 22
3000 1500 0.8841 73 63 3516752 25 3 72 22
3120 1500 0.8841 73 63 3518132 25 2 73 22
3240 1500 0.8841 73 63 3518132 25 2 73 22
3360 1500 0.8841 73 63 3517620 24 2 73 22
3480 1500 0.8841 73 63 3517428 25 2 73 22
3600 1500 0.8841 73 63 3517656 25 2 73 22
Integer Stress Tests below or Go To Start
Integer Stress Tests - MP-IntStress
The following are results of 15 minute stress tests, using 1280 KB data and 8 threads. The data is greater than L2 cache,
but was in cache as only four threads were executed at a time. This then ran at full speed, with additional swapping of
cached data.
Four tests were carried out with no added cooling on a bare board, fitted with a copper heatsink, then with the official,
and expensive, Power Over Ethernet fan and, finally, using an inexpensive case with a fitted fan (GeeekPi Acrylic Case).
The changing CPU MHz measurements show that throttling is occurring but, with coarse sampling, they do not reflect real
performance, unlike the MB/second details.
With no cooling, throttling started after a minute, reaching 85°C to 86°C, slowly reducing performance to almost half
speed. The copper heatsink produced a small improvement. During the two tests where fans were used, the processor ran
continuously at 1500 MHz and throughput effectively at a constant MB/second. The POE fan appeared to be slightly more
efficient.
No Cooling Copper Heatsink Official POE Hat Case With Fan
Seconds MB/sec MHz °C MB/sec MHz °C MB/sec MHz °C MB/sec MHz °C
0 1500 60 1500 60 1500 47 1500 41
20 21651 1500 73 21381 1500 71 21770 1500 56 22018 1500 54
40 21892 1500 79 20517 1500 74 21767 1500 57 21979 1500 56
60 20919 1500 81 21407 1500 77 22234 1500 57 22076 1500 58
80 17174 1000 81 21153 1500 79 22035 1500 58 22248 1500 60
100 15643 1000 81 20960 1500 81 21920 1500 59 22153 1500 61
120 15163 1000 82 18967 1500 82 22184 1500 60 22239 1500 63
140 14756 1000 81 16828 1000 81 21941 1500 60 22037 1500 64
160 14491 1000 83 15892 1500 83 21863 1500 60 22231 1500 65
180 14492 1000 83 16157 1000 82 21753 1500 60 22130 1500 64
200 14283 1000 84 15039 1000 82 21921 1500 60 22050 1500 65
220 14386 1000 83 15438 1000 82 21656 1500 60 22210 1500 66
240 14101 1000 83 14905 1000 82 21908 1500 60 22132 1500 65
260 13574 1000 84 14597 1000 83 21983 1500 60 22298 1500 65
280 13763 1000 83 14703 1000 83 21701 1500 60 22031 1500 66
300 13179 1000 84 14519 1000 82 21857 1500 60 22285 1500 65
320 13566 1000 84 14204 1000 84 21791 1500 60 22009 1500 65
340 13368 750 84 14139 750 83 21468 1500 60 22101 1500 65
360 13530 1000 84 14249 1000 84 22162 1500 60 22166 1500 65
380 13190 1000 85 14457 1000 82 21819 1500 61 22163 1500 66
400 13215 1000 84 14395 1000 83 21800 1500 60 22243 1500 65
420 13021 750 85 14365 1000 83 22083 1500 61 22115 1500 64
440 13127 1000 84 14214 1000 83 21780 1500 60 22172 1500 64
460 12933 1000 85 14152 1000 83 21902 1500 60 22138 1500 64
480 12658 1000 85 14090 1000 84 21964 1500 60 22220 1500 64
500 12981 750 83 14199 1000 84 22026 1500 61 22061 1500 65
520 12699 1000 85 14005 1000 83 21661 1500 61 22027 1500 64
540 12622 1000 84 13987 1000 84 21684 1500 60 22281 1500 65
560 12761 1000 84 14222 1000 84 22071 1500 59 22097 1500 64
580 13408 1000 84 13845 1000 84 21728 1500 58 22225 1500 64
600 13878 1000 85 13945 1000 84 21981 1500 59 22091 1500 62
620 13893 1000 83 13877 1000 84 21704 1500 58 22203 1500 62
640 13717 1000 86 13844 1000 84 21935 1500 58 22133 1500 62
660 13321 1000 85 13774 1000 83 21816 1500 61 22075 1500 62
680 13154 1000 85 13500 1000 83 21827 1500 61 22229 1500 63
700 12663 1000 85 13926 1000 83 21995 1500 60 22007 1500 63
720 12504 1000 85 13722 1000 83 22004 1500 60 22279 1500 64
740 12501 750 85 13778 750 84 21954 1500 60 22020 1500 65
760 12227 1000 85 13564 1000 83 21848 1500 60 22270 1500 65
780 12199 750 85 13755 1000 82 21840 1500 61 22129 1500 65
800 12505 1000 85 13451 1500 82 22137 1500 59 22175 1500 64
820 12268 750 85 13587 1000 83 21876 1500 60 22210 1500 64
840 12322 1500 85 13610 1000 82 21685 1500 61 22041 1500 65
860 12312 1500 85 14411 1500 82 22077 1500 61 22192 1500 65
880 12306 1500 85 14380 1500 83 21842 1500 61 22109 1500 65
900 12305 1500 85 14345 1500 83 21883 1500 61 22199 1500 65
Max 21892 86 21407 84 22234 61 22298 66
Min 12199 750 13451 750 21468 1500 21979 1500
%Min/Max 56 63 97 99
Floating Point Stress Tests or Go To Start
Single Precision Floating Point Stress Tests - MP-FPUStress
The table below covers the first 10 minutes of tests on the three cooling configurations. This time, the rather meaningless
variations in recorded CPU MHz are not included. Again they used 1280 KB data (320K words) and 8 threads, with 8 floating
point operations per word. Maximum temperatures and associated performance degradations were similar to those during
the integer tests.
The following graphs provide a more meaningful indication of the effects of adequate cooling that is needed for this kind
CPU utilisation (confirmed during running by vmstat as 100% of four cores).
No Cooling Copper HS Case+Fan
Seconds °C GFLOPS °C GFLOPS °C GFLOPS
0 61 59 40
20 76 19.2 73 19.6 55 20.7
40 81 19.0 78 19.4 61 20.3
60 82 17.8 80 19.6 62 20.2
80 83 15.5 82 17.2 64 20.7
100 84 15.0 82 15.6 65 20.2
120 83 14.0 82 14.5 66 20.3
140 84 13.3 81 13.9 65 20.3
160 84 13.3 83 13.9 66 20.7
180 86 12.9 83 13.5 67 20.3
200 85 13.0 83 13.6 67 20.3
220 84 12.8 84 13.4 66 20.4
240 84 12.6 83 13.3 67 20.6
260 83 12.6 84 13.3 67 20.3
280 85 12.2 84 13.3 67 20.4
300 84 12.1 83 13.0 67 20.3
320 85 12.0 84 13.0 67 20.8
340 84 11.6 85 12.8 67 20.3
360 85 11.6 84 13.0 67 20.2
380 85 11.3 83 12.7 67 20.7
400 85 11.6 84 12.8 67 20.5
420 84 11.6 84 12.5 68 20.2
440 85 11.5 84 12.7 67 20.4
460 84 11.5 85 12.6 67 20.4
480 85 11.5 84 12.3 66 20.2
500 84 11.1 85 12.4 67 20.3
520 85 11.3 83 12.4 67 20.2
540 84 11.4 85 12.4 68 20.5
560 84 11.3 84 12.3 67 20.2
580 85 11.3 83 12.3 67 20.4
600 85 11.3 84 12.3 67 20.2
900 85 10.9 84 12.2 67 20.3
Max 19.2 19.6 20.8
Min 10.9 12.2 20.3
%Min/Max 57
Double Precision Floating Point Stress Tests below or Go To Start
Double Precision Floating Point Stress Tests - MP-FPUStressDP
Four sets of results are below, again excluding those CPU MHz figures, but including PMIC temperatures. They are without
and with the case/fan, using 8 threads, one with 1280 KB data size at 8 operations per word, and the other 128 KB with 32
operations per word.
The second one runs at a higher speed and lower temperature, using data in L1 caches, compared with the other via L2
cache. Maximum temperature and performance degradation of the latter were similar to the earlier examples.
1280 KB, 8 Threads, 8 Ops/Word 128 KB, 8 Threads, 32 Ops/Word
No Fan CPU PMIC Fan CPU PMIC No Fan CPU PMIC Fan CPU PMIC
Second GFLOPS °C °C GFLOPS °C °C GFLOPS °C °C GFLOPS °C °C
0 48 42.0 45 42.0 54 47.7 39 35.4
20 9.3 64 55.2 9.1 61 55.2 10.7 70 57.1 10.7 39 35.4
40 9.2 73 62.8 9.0 65 59.0 10.6 73 61.8 10.7 53 43.9
60 9.2 79 68.4 9.1 67 61.8 10.7 75 64.6 10.6 56 48.6
80 8.8 80 70.3 9.3 66 62.8 10.7 78 67.5 10.6 57 50.5
100 7.8 81 70.3 9.1 67 62.8 10.7 80 69.4 10.7 58 51.4
120 7.2 82 70.3 9.2 67 62.8 10.1 82 70.3 10.7 59 53.3
140 6.8 82 70.3 9.3 67 62.8 9.5 81 70.3 10.7 59 53.3
160 6.5 82 70.3 9.1 68 62.8 9.1 80 70.3 10.6 59 53.3
180 6.3 82 70.3 9.1 68 62.8 8.7 82 70.3 10.7 60 53.3
200 6.1 81 70.3 9.3 68 64.6 8.5 81 70.3 10.7 59 54.3
220 6.2 82 70.3 9.1 69 62.8 8.5 82 70.3 10.7 59 54.3
240 6.2 83 72.2 9.1 68 62.8 8.3 81 70.3 10.6 60 54.3
260 6.1 83 72.2 9.3 68 62.8 8.3 81 70.3 10.7 59 54.3
280 6.1 84 72.2 9.1 67 64.6 8.0 83 70.3 10.7 61 54.3
300 6.1 83 70.3 9.1 68 64.6 8.0 81 70.3 10.6 60 54.3
320 6.0 84 72.2 9.1 68 64.6 7.9 82 70.3 10.7 61 54.3
340 5.9 85 72.2 9.2 68 64.6 7.6 82 71.2 10.8 61 53.3
360 5.8 85 72.2 9.1 68 62.8 7.7 82 70.3 10.7 60 54.3
380 5.8 84 72.2 9.2 68 64.6 7.8 83 70.3 10.6 60 54.3
400 5.7 84 72.2 9.1 68 62.8 7.7 83 70.3 10.6 61 54.3
420 5.7 84 72.2 9.2 68 62.8 7.7 82 70.3 10.6 60 54.3
440 5.6 84 72.2 9.1 68 64.6 7.6 82 70.3 10.7 60 54.3
460 5.7 84 72.2 9.1 68 62.8 7.6 83 70.3 10.6 61 54.3
480 5.6 84 72.2 9.1 69 64.6 7.5 82 70.3 10.7 60 54.3
500 5.6 84 72.2 9.1 69 62.8 7.5 82 71.2 10.6 60 54.3
520 5.5 85 72.2 9.1 68 62.8 7.4 81 70.3 10.7 60 54.3
540 5.5 84 74.1 9.3 67 64.6 7.4 82 70.3 10.7 60 54.3
560 5.5 84 72.2 9.1 69 62.8 7.4 82 70.3 10.8 59 54.3
580 5.4 84 74.1 9.1 67 64.6 7.3 82 70.3 10.7 60 55.2
600 5.5 84 74.1 9.2 68 62.8 7.3 81 70.3 10.7 60 54.3
620 5.4 85 74.1 9.2 68 62.8 7.3 82 70.3 10.6 61 54.3
640 5.4 84 74.1 9.2 69 62.8 7.3 83 70.3 10.6 62 55.2
660 5.4 85 74.1 9.3 68 62.8 7.3 83 70.3 10.7 60 54.3
680 5.5 85 72.2 9.0 67 62.8 7.3 83 70.3 10.7 60 54.3
700 5.4 85 74.1 9.1 69 62.8 7.3 81 70.3 10.7 60 54.3
720 5.4 85 72.2 9.2 68 64.6 7.3 84 70.3 10.7 60 54.3
740 5.4 84 72.2 9.1 68 62.8 7.3 82 70.3 10.7 60 55.2
760 5.3 85 74.1 9.1 68 62.8 7.3 81 70.3 10.7 60 54.3
780 5.4 85 74.1 9.3 67 62.8 7.3 83 70.3 10.7 59 54.3
800 5.4 84 74.1 9.1 69 64.6 7.3 81 70.3 10.7 60 54.3
820 5.3 85 72.2 9.1 68 62.8 7.3 82 70.3 10.7 60 54.3
840 5.3 84 72.2 9.2 68 62.8 7.2 82 70.3 10.7 60 54.3
860 5.2 85 74.1 9.1 69 64.6 7.2 81 70.3 10.6 60 54.3
880 5.2 85 74.1 9.1 68 62.8 7.2 82 70.3 10.6 60 54.3
900 5.3 84 74.1 9.1 68 62.8 7.2 81 70.3 10.6 60 54.3
Max 9.3 85 74.1 9.3 69 64.6 10.7 84 71.2 10.8 62 55.2
Min 5.2 9.0 7.2 10.6
%Min/Ma 57 97 67 98
High Performance Linpack below or Go To Start
High Performance Linpack Tests - xhpl
Parameter sizes (as set in HPL.dat) were the same as in the introductory description, except for the one for data size (N).
The programs were run on a bare board Pi 4 and one in the inexpensive case with a fan. No data errors or system
freezes/crashes were encountered over these and many more runs.
Following is a summary of four tests on each of the test beds. The the bare board arrangement performs relatively well for
short duration tests, but the long ones are needed to demonstrate maximum performance. The latter was 10.8 Double
Precision GFLOPS, similar to my MP-FPUStressDP program, where, at 58%, that also applied to efficiency of the uncooled
processor. As it should be, the sumchecks of hot and cold systems were identical, at a given data size.
Assuming similarity with the original scalar Linpack benchmark, data size would be N x N x 8 for double precision operation
or 3.2 GB at N = 20000, as approximately confirmed by the vmstat memory details provided below. The latter also indicate
that the four core CPU utilisation was 100%.
Below the table is a graph, of the worst case uncooled scenario, to demonstrate CPU MHz throttling and temperature (°C
times 10), based on samples every 10 seconds.
Cooling N Seconds GFLOPS SumCheck Max °C Av MHz
None 4000 5.7 7.4 0.002398 71 1500
Fan 4000 5.2 8.2 0.002398 54 1500
None 8000 39.9 8.6 0.001675 81 1500
Fan 8000 36.7 9.3 0.001675 61 1500
None 16000 404.3 6.8 0.001126 86 919
Fan 16000 263.0 10.4 0.001126 70 1500
None 20000 856.0 6.2 0.001019 87 828
Fan 20000 494.3 10.8 0.001019 71 1500
%None/Fan 20000 58 58 Same 55
procs -----------memory---------- ---swap-- -----io---- -system- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 3510712 30172 276440 0 0 17 1 90 111 16 1 83 0 0
4 0 0 3097880 30180 277088 0 0 0 6 526 515 52 3 45 0 0
4 0 0 2357404 30188 276492 0 0 0 6 620 344 95 5 0 0 0
4 0 0 1615192 30196 276976 0 0 0 11 586 289 95 5 0 0 0
5 0 0 871872 30204 271032 0 0 0 5 490 75 96 4 0 0 0
4 0 768 282692 26828 241092 0 34 20 40 604 307 95 4 0 0 0
4 0 768 276088 26968 250344 6 0 118 12 591 288 99 1 0 0 0
Livermore loops/OpenGL Tests below or Go To Start
Livermore Loops/OpenGL Tests
Three copies of Livermore Loops stress tests were run along with the OpenGL Tiled Kitchen section, on a Pi 4 without any
cooling, then in the case with a fan. The former program was arranged to have a nominal durations of 864 seconds (72 x
12). When running, the CPU load is continuously changing and that can be reflected in ongoing temperature and OpenGL
Frames Per Second. The tests make use of six terminal windows and a full screen display, run by the commands shown
below. This is followed by the results.
With no cooling, there were the usual increases in temperature and performance degradation, but not as severe as some of
the earlier tests. With cooling performance was effectively constant. Averages at the end reflect the differences. There
were no reports of errors or any sign of system failures.
Dual Monitors - The benchmarks, with no cooling, were repeated using two monitors, providing a screen area of 3840 x
1080 pixels, the results being included below. Performance was only between 7% and 15% slower than the single monitor
example. Benchmark results of all OpenGL tests and provided at the end of the table, showing those more dependent on
graphics speed were affected by the number of pixels displayed.
Run Commands
Terminal 1
vmstat 10 100
Terminal 2 script file
lxterminal -e ./RPiHeatMHzVolts2 Passes 120 Seconds 10 Log 20
lxterminal -e ./liverloopsPiA7R Seconds 12 Log 20
lxterminal -e ./liverloopsPiA7R Seconds 12 Log 21
lxterminal -e ./liverloopsPiA7R Seconds 12 Log 22
Terminal 3
./videogl32 Test 6, Mins 16, Log 20
Dual Monitors
No Cooling Case + Fan No Cooling
Seconds MHz °C FPS MHz °C FPS MHz °C FPS
0 1500 64 1500 42 1500 69
30 1000 82 19 1500 57 20 1000 82 13
60 1000 82 16 1500 62 21 750 84 13
90 1500 83 15 1500 66 20 1000 83 12
120 750 85 13 1500 64 21 1000 85 11
150 1000 84 13 1500 62 20 600 84 10
180 1000 83 14 1500 60 22 750 85 10
210 1000 84 15 1500 62 21 1000 85 12
240 1000 83 14 1500 61 19 750 84 12
270 1000 84 14 1500 63 21 1000 85 11
300 1000 84 14 1500 61 21 750 84 12
330 750 84 14 1500 64 21 1000 85 12
360 1000 82 14 1500 64 21 750 84 11
390 1000 83 12 1500 66 21 750 84 12
420 1000 84 13 1500 63 21 750 84 12
450 1000 84 14 1500 62 20 750 85 11
480 750 84 12 1500 63 21 750 85 12
510 750 85 13 1500 61 21 1000 84 12
540 750 84 11 1500 59 21 750 84 11
570 1000 84 12 1500 62 21 1000 85 11
600 1000 84 14 1500 62 22 750 83 10
630 1000 84 13 1500 66 19 750 84 11
660 750 84 14 1500 60 21 750 85 12
690 750 86 13 1500 65 21 1000 85 12
720 1000 84 13 1500 63 21 600 83 11
750 1000 83 13 1500 62 21 1000 84 12
780 750 84 12 1500 61 21 1000 85 11
810 750 85 12 1500 62 21 1000 84 11
840 1000 85 12 1500 58 21 750 86 10
870 750 85 12 1500 58 21 750 85 11
900 1000 84 13 1500 54 21 1000 85 10
930 1000 85 13 1500 50 21 1000 85 11
960 1000 84 13 1500 49 21 750 85 11
990 1000 85 14 1500 45 21 750 85 12
Average 956 83 13 1500 60 21 866 84 11
%Fan 64 139 64
MFLOPS 916 1502 854
%Fan 61
OpenGL Benchmark Single and Dual Monitors
Window Size Coloured Objects Textured Objects WireFrm Texture
Pixels Few All Few All Kitchen Kitchen
Wide High FPS FPS FPS FPS FPS FPS
1920 1080 58.2 56.7 54.5 49.9 31.0 20.7
3840 1080 27.9 26.5 26.0 25.2 25.7 16.3
Input/Output Stress Tests below or Go To Start
Input/Output Stress Tests - burnindrive2
For this test, three copies of burnindrive2 were run, accessing the main drive, a USB 3 stick and a remote PC via a 1 Gbps
LAN, along with MP-IntStress using four threads. The environment was monitored using RPiHeatMHzVolts2, vmstat for drive
activity and CPU MHz, and sar -n for network traffic. Commands used and results are provided below. Stress tests are
generally based on executing a fixed set of activities, where completion times can vary. Hence, the provided results are
extrapolated approximations, with drive speeds the average for a particular activity.
All stress tests ran to completion without detecting any errors. CPU utilisation was around 90% of four cores but CPU
throttling still occurred, with temperatures up to 86°C (and possibly not enough throttling). Performance measured by the
stress tests was broadly in line with the system vmstat and sar measurements. In order to indicate which activity suffered
from the most degradation, performance of standalone runs are also provided. It seems that LAN traffic was given a higher
priority, with no speed reduction, followed by the main SD drive. Worst was the CPU bound program, probably suffering
from a lower priority besides throttling.
------ MB/second ------
Secs Main USB 3 1Gbps MP-Int MHz °C
Drive Drive LAN Stress
0 1500 55
30 11.9 38.0 42.3 13116 1500 66
60 11.9 44.1 32.8 13063 1500 73
90 28.1 44.1 32.8 13615 1500 75
120 28.1 44.1 32.8 13734 1500 81
150 28.1 44.1 32.8 13370 1500 83
180 28.1 44.1 32.8 13555 1000 82
210 28.1 44.1 32.8 13285 1000 82
240 28.1 44.1 32.8 13194 1000 82
270 28.1 44.1 32.8 13022 1000 83
300 28.1 44.1 32.8 13316 1000 82
330 28.1 44.1 32.8 13615 1000 82
360 28.1 44.1 32.8 13677 1000 84
390 28.1 44.1 32.8 13315 1000 83
420 28.1 44.1 32.8 13273 1000 82
450 28.1 44.1 32.8 13117 1000 83
480 28.1 44.1 32.8 12860 1000 83
510 28.1 44.1 32.8 12370 1000 83
540 28.1 44.1 32.8 11863 1000 84
570 28.1 44.1 32.8 11550 1000 84
600 28.1 44.1 32.8 11312 1000 82
630 28.1 44.1 32.7 10895 1000 83
660 28.1 54.0 32.7 10696 1000 83
690 29.7 54.0 32.7 10479 1000 84
720 29.7 54.0 32.7 10223 750 84
750 29.7 54.0 32.7 10227 1000 85
780 29.7 54.0 32.7 10413 750 84
810 29.7 54.0 10090 750 86
840 29.7 9952 1000 84
Stand Alone
Max 33.4 68.6 32.3 22664
vmstat
procs -----------memory---------- --swap-- -----io---- -system- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
Start
6 2 0 3499820 45700 271552 0 0 12409 32193 16450 13425 54 24 20 2 0
2 2 0 3503956 45776 264632 0 0 46811 12381 27174 16714 68 23 3 5 0
4 2 0 3506080 45816 264348 0 0 76271 248 25885 16188 64 22 7 7 0
Read 1
5 2 0 3502984 45992 264844 0 0 75473 5 18777 14118 67 24 3 6 0
5 2 0 3504888 46032 264884 0 0 74726 7 18907 14631 66 25 4 5 0
Read 2
6 2 0 3503236 46544 265452 0 0 86628 7 17180 15114 62 28 4 6 0
4 2 0 3501964 46592 265452 0 0 80815 6 15395 14321 68 28 2 2 0
Ethernet Read sar -n DEV
rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil
24841.37 6883.90 36206.23 505.50 0.00 0.00 0.03 29.66
Go To Start
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.