RPi Performance

From eLinux.org
Revision as of 16:07, 28 January 2012 by Tufty (Talk | contribs) (Transferring from RaspberryPiPerformance)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

CPU

Linpack

The Arm has been tested using the linpack benchmark from [1], built with gcc with -O3 (Optimisation level 3). Run with array size 200.

With software floating point

Source

[2]

Compile/Run

cc -O3 -o linpack linpack.c -lm
  linpack.c: In function ‘main’:
  linpack.c:69: warning: return type of ‘main’ is not ‘int’
./linpack
  Enter array size (q to quit) [200]: 200


Results

Crippled

Memory required:  315K.

LINPACK benchmark, Double precision.
Machine precision:  15 digits.
Array size 200 X 200.
Average rolled and unrolled performance:

    Reps Time(s) DGEFA   DGESL  OVERHEAD    KFLOPS
       2   0.53  92.45%   1.89%   5.66%   5493.333
       4   1.07  92.52%   2.80%   4.67%   5385.621
       8   2.12  92.45%   2.36%   5.19%   5466.003
      16   4.24  92.45%   2.83%   4.72%   5438.944
      32   8.49  92.11%   2.71%   5.18%   5459.213
      64  16.98  92.05%   2.89%   5.06%   5452.440

Hardware floating point (-mfloat-abi=softfp)

Memory required:  315K.
LINPACK benchmark, Double precision.
Machine precision:  15 digits.
Array size 200 X 200.
Average rolled and unrolled performance:

    Reps Time(s) DGEFA   DGESL  OVERHEAD    KFLOPS
       8   0.51  90.20%   3.92%   5.88%  22888.889
      16   1.02  89.22%   4.90%   5.88%  22888.889
      32   2.05  90.24%   3.41%   6.34%  22888.889
      64   4.08  91.42%   2.94%   5.64%  22829.437
     128   8.16  91.54%   2.94%   5.51%  22799.827
     256  16.31  91.35%   2.76%   5.89%  22903.800

Whetstone/Dhrystone

All code compiled with gcc options -float-abi=softfp -O3

Source

Code for these tests can be found here http://www.rowley.co.uk/arm/whet_dhry.zip. Or if 404 this code might be analogous http://freespace.virgin.net/roy.longbottom/benchnt.zip


Compile/Run

?


Results

Dhrystone

Microseconds for one run through Dhrystone: 1.2

Dhrystones per Second: 809061.5 


Whetstone Crippled

Loops: 1000, Iterations: 10, Duration: 24 sec.

C Converted Double Precision Whetstones: 41.7 MIPS

Rebuilding the Whetstone test code with 'gcc -mfpu -float-abi=softfp' gives better results:


Loops: 1000, Iterations: 100, Duration: 106 sec.
C Converted Double Precision Whetstones: 94.3 MIPS

However, the majority of compute time is spent in the SQRT function, which for the above test was built without -mfpu=vfp. Using a library with vfp give the following much improved result :

Loops: 1000, Iterations: 100, Duration: 15 sec.
C Converted Double Precision Whetstones: 666.7 MIPS

OpenSSL

Source

[3]

Compile/Run

openssl version;
openssl speed;

Results

OpenSSL 0.9.8o 01 Jun 2010
built on: Thu Aug 26 18:56:26 UTC 2010
options:bn(64,32) md2(int) rc4(ptr,int) des(idx,risc1,4,long) aes(partial) blowfish(idx)
compiler: gcc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -O2 -Wa,--noexecstack -g -Wall
available timing options: TIMES TIMEB HZ=100 [sysconf value]
timing function used: times
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md2                148.81k      372.18k      624.81k      769.95k      832.90k
mdc2                 0.00         0.00         0.00         0.00         0.00
md4                615.30k     2468.76k     7612.19k    16707.01k    28104.86k
md5                380.13k     1501.12k     4800.77k    11312.81k    21682.77k
hmac(md5)         1022.28k     3480.23k     9587.80k    17492.25k    25441.78k
sha1               303.72k     1092.39k     3106.50k     6302.57k     9852.39k
rmd160             244.29k      849.04k     2414.53k     4747.26k     7513.00k
rc4              14658.70k    16836.49k    17462.03k    17628.21k    17522.08k
des cbc           2913.17k     3221.30k     3289.77k     3360.09k     3367.21k
des ede3          1149.87k     1188.59k     1198.46k     1206.00k     1208.25k
idea cbc             0.00         0.00         0.00         0.00         0.00
seed cbc             0.00         0.00         0.00         0.00         0.00
rc2 cbc           2812.71k     3012.02k     3054.19k     3077.82k     3076.12k
rc5-32/12 cbc        0.00         0.00         0.00         0.00         0.00
blowfish cbc      6091.32k     7007.89k     7250.62k     7288.21k     7163.88k
cast cbc          5068.25k     6020.03k     6345.71k     6367.64k     6260.44k
aes-128 cbc       3205.76k     3497.72k     3616.00k     3652.49k     3665.85k
aes-192 cbc       2730.65k     2981.88k     3073.20k     3102.38k     3111.86k
aes-256 cbc       2383.90k     2596.12k     2659.91k     2702.13k     2732.50k
camellia-128 cbc     0.00         0.00         0.00         0.00         0.00
camellia-192 cbc     0.00         0.00         0.00         0.00         0.00
camellia-256 cbc     0.00         0.00         0.00         0.00         0.00
sha256             679.98k     1629.47k     2905.43k     3708.32k     4175.45k
sha512              41.02k      163.83k      232.63k      318.20k      353.81k
aes-128 ige       3089.03k     3579.08k     3698.68k     3689.14k     3578.18k
aes-192 ige       2641.68k     3019.45k     3111.38k     3144.95k     3035.70k
aes-256 ige       2334.50k     2632.35k     2705.04k     2735.69k     2687.74k
                  sign    verify    sign/s verify/s
rsa  512 bits 0.013747s 0.001193s     72.7    838.4
rsa 1024 bits 0.063481s 0.002742s     15.8    364.7
rsa 2048 bits 0.321250s 0.007378s      3.1    135.5
rsa 4096 bits 1.805000s 0.022528s      0.6     44.4
                  sign    verify    sign/s verify/s
dsa  512 bits 0.011690s 0.013597s     85.5     73.5
dsa 1024 bits 0.027233s 0.031683s     36.7     31.6
dsa 2048 bits 0.073897s 0.087304s     13.5     11.5

GPU

The RaspberryPi appears to handle h264 1080p movie from USB to HDMI at least 4MB/s.

The Admin "JamesH" said it would handle "basically 1080p30, high profile, >40Mb/s." (5MB/s)


3DMarkMobile ES 2.0

Source

?

Compile/Run

?

Results

?

IO

USB buss

  • All IO uses the same bus so the combination of all IO can not exceed the the bus speed of an as yet hypothetical 60MB/s

SD card

  • TODO test

Compile/Run

dd if=/dev/zero of=~/test.tmp bs=100K count=1024 
dd if~/test.tmp of=/dev/null bs=100K count=1024 
rm ~/test.tmp

Results

?maybe 15MB/s?

NIC

  • TODO test with wget, curl, etc