RaspberryPiPerformance
CPU
Linpack
The Arm has been tested using the linpack benchmark from [1], built with gcc with -O3 (Optimisation level 3). Run with array size 200.
With software floating point
Source
Compile/Run
cc -O3 -o linpack linpack.c -lm linpack.c: In function ‘main’: linpack.c:69: warning: return type of ‘main’ is not ‘int’ ./linpack Enter array size (q to quit) [200]: 200
Results
Crippled
Memory required: 315K. LINPACK benchmark, Double precision. Machine precision: 15 digits. Array size 200 X 200. Average rolled and unrolled performance: Reps Time(s) DGEFA DGESL OVERHEAD KFLOPS 2 0.53 92.45% 1.89% 5.66% 5493.333 4 1.07 92.52% 2.80% 4.67% 5385.621 8 2.12 92.45% 2.36% 5.19% 5466.003 16 4.24 92.45% 2.83% 4.72% 5438.944 32 8.49 92.11% 2.71% 5.18% 5459.213 64 16.98 92.05% 2.89% 5.06% 5452.440
Hardware floating point (-mfloat-abi=softfp)
Memory required: 315K. LINPACK benchmark, Double precision. Machine precision: 15 digits. Array size 200 X 200. Average rolled and unrolled performance: Reps Time(s) DGEFA DGESL OVERHEAD KFLOPS 8 0.51 90.20% 3.92% 5.88% 22888.889 16 1.02 89.22% 4.90% 5.88% 22888.889 32 2.05 90.24% 3.41% 6.34% 22888.889 64 4.08 91.42% 2.94% 5.64% 22829.437 128 8.16 91.54% 2.94% 5.51% 22799.827 256 16.31 91.35% 2.76% 5.89% 22903.800
Whetstone/Dhrystone
All code compiled with gcc options -float-abi=softfp -O3
Source
Code for these tests can be found here http://www.rowley.co.uk/arm/whet_dhry.zip. Or if 404 this code might be analogous http://freespace.virgin.net/roy.longbottom/benchnt.zip
Compile/Run
?
Results
Dhrystone
Microseconds for one run through Dhrystone: 1.2 Dhrystones per Second: 809061.5
Whetstone Crippled
Loops: 1000, Iterations: 10, Duration: 24 sec. C Converted Double Precision Whetstones: 41.7 MIPS
Rebuilding the Whetstone test code with 'gcc -mfpu -float-abi=softfp' gives better results:
Loops: 1000, Iterations: 100, Duration: 106 sec. C Converted Double Precision Whetstones: 94.3 MIPS
However, the majority of compute time is spent in the SQRT function, which for the above test was built without -mfpu=vfp. Using a library with vfp give the following much improved result :
Loops: 1000, Iterations: 100, Duration: 15 sec. C Converted Double Precision Whetstones: 666.7 MIPS
OpenSSL
Source
Compile/Run
openssl version; openssl speed;
Results
OpenSSL 0.9.8o 01 Jun 2010 built on: Thu Aug 26 18:56:26 UTC 2010 options:bn(64,32) md2(int) rc4(ptr,int) des(idx,risc1,4,long) aes(partial) blowfish(idx) compiler: gcc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -O2 -Wa,--noexecstack -g -Wall available timing options: TIMES TIMEB HZ=100 [sysconf value] timing function used: times The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes md2 148.81k 372.18k 624.81k 769.95k 832.90k mdc2 0.00 0.00 0.00 0.00 0.00 md4 615.30k 2468.76k 7612.19k 16707.01k 28104.86k md5 380.13k 1501.12k 4800.77k 11312.81k 21682.77k hmac(md5) 1022.28k 3480.23k 9587.80k 17492.25k 25441.78k sha1 303.72k 1092.39k 3106.50k 6302.57k 9852.39k rmd160 244.29k 849.04k 2414.53k 4747.26k 7513.00k rc4 14658.70k 16836.49k 17462.03k 17628.21k 17522.08k des cbc 2913.17k 3221.30k 3289.77k 3360.09k 3367.21k des ede3 1149.87k 1188.59k 1198.46k 1206.00k 1208.25k idea cbc 0.00 0.00 0.00 0.00 0.00 seed cbc 0.00 0.00 0.00 0.00 0.00 rc2 cbc 2812.71k 3012.02k 3054.19k 3077.82k 3076.12k rc5-32/12 cbc 0.00 0.00 0.00 0.00 0.00 blowfish cbc 6091.32k 7007.89k 7250.62k 7288.21k 7163.88k cast cbc 5068.25k 6020.03k 6345.71k 6367.64k 6260.44k aes-128 cbc 3205.76k 3497.72k 3616.00k 3652.49k 3665.85k aes-192 cbc 2730.65k 2981.88k 3073.20k 3102.38k 3111.86k aes-256 cbc 2383.90k 2596.12k 2659.91k 2702.13k 2732.50k camellia-128 cbc 0.00 0.00 0.00 0.00 0.00 camellia-192 cbc 0.00 0.00 0.00 0.00 0.00 camellia-256 cbc 0.00 0.00 0.00 0.00 0.00 sha256 679.98k 1629.47k 2905.43k 3708.32k 4175.45k sha512 41.02k 163.83k 232.63k 318.20k 353.81k aes-128 ige 3089.03k 3579.08k 3698.68k 3689.14k 3578.18k aes-192 ige 2641.68k 3019.45k 3111.38k 3144.95k 3035.70k aes-256 ige 2334.50k 2632.35k 2705.04k 2735.69k 2687.74k sign verify sign/s verify/s rsa 512 bits 0.013747s 0.001193s 72.7 838.4 rsa 1024 bits 0.063481s 0.002742s 15.8 364.7 rsa 2048 bits 0.321250s 0.007378s 3.1 135.5 rsa 4096 bits 1.805000s 0.022528s 0.6 44.4 sign verify sign/s verify/s dsa 512 bits 0.011690s 0.013597s 85.5 73.5 dsa 1024 bits 0.027233s 0.031683s 36.7 31.6 dsa 2048 bits 0.073897s 0.087304s 13.5 11.5
GPU
The RaspberryPi appears to handle h264 1080p movie from USB to HDMI at least 4MB/s. The Admin "JamesH" said it would handle "basically 1080p30, high profile, >40Mb/s."
3DMarkMobile ES 2.0
Source
?
Compile/Run
?
Results
?
IO
USB buss
- All IO uses the same bus so the combination of all IO can not exceed the the bus speed of an as yet hypothetical 60MB/s
SD card
- TODO test
Compile/Run
dd if=/dev/zero of=~/test.tmp bs=100K count=1024 dd if~/test.tmp of=/dev/null bs=100K count=1024 rm ~/test.tmp
Results
- Depends on SD card used http://elinux.org/RaspberryPiBoardVerifiedPeripherals#SDHC_cards
?maybe 15MB/s?
NIC
- TODO test with wget, curl, etc