RaspberryPiPerformance

From eLinux.org
Revision as of 01:07, 29 September 2011 by JamesH (talk | contribs) (Whetstone/Dhrystone)
Jump to: navigation, search

Linpack

The Arm has been tested using the linpack benchmark from [1], built with gcc with -O3 (Optimisation level 3). Run with array size 200.

With software floating point

Memory required:  315K.

LINPACK benchmark, Double precision.
Machine precision:  15 digits.
Array size 200 X 200.
Average rolled and unrolled performance:

    Reps Time(s) DGEFA   DGESL  OVERHEAD    KFLOPS
       2   0.53  92.45%   1.89%   5.66%   5493.333
       4   1.07  92.52%   2.80%   4.67%   5385.621
       8   2.12  92.45%   2.36%   5.19%   5466.003
      16   4.24  92.45%   2.83%   4.72%   5438.944
      32   8.49  92.11%   2.71%   5.18%   5459.213
      64  16.98  92.05%   2.89%   5.06%   5452.440

Hardware floating point (-mfloat-abi=softfp)

Memory required:  315K.
LINPACK benchmark, Double precision.
Machine precision:  15 digits.
Array size 200 X 200.
Average rolled and unrolled performance:

    Reps Time(s) DGEFA   DGESL  OVERHEAD    KFLOPS
       8   0.51  90.20%   3.92%   5.88%  22888.889
      16   1.02  89.22%   4.90%   5.88%  22888.889
      32   2.05  90.24%   3.41%   6.34%  22888.889
      64   4.08  91.42%   2.94%   5.64%  22829.437
     128   8.16  91.54%   2.94%   5.51%  22799.827
     256  16.31  91.35%   2.76%   5.89%  22903.800

Whetstone/Dhrystone

Code for these tests can be found here http://www.rowley.co.uk/arm/whet_dhry.zip.

All code compiled with gcc options -float-abi=softfp -O3

Whetstone

Loops: 1000, Iterations: 10, Duration: 24 sec.

C Converted Double Precision Whetstones: 41.7 MIPS

Dhrystone

Microseconds for one run through Dhrystone: 1.2

Dhrystones per Second: 809061.5 

Rebuilding the test code 'gcc -mfpu -float-abi=softfp' gives better results:

./whetstone
Loops: 1000, Iterations: 100, Duration: 106 sec.
C Converted Double Precision Whetstones: 94.3 MIPS

The majority of time is spent in the SQRT function, whcih for the above test was built without -mfpu=vfp. Using a library with vfp give the following much improved result :

Loops: 1000, Iterations: 100, Duration: 15 sec.
C Converted Double Precision Whetstones: 666.7 MIPS

OpenSSL

Results of running openssl speed

Doing md2 for 3s on 16 size blocks: 27716 md2's in 2.98s
Doing md2 for 3s on 64 size blocks: 17388 md2's in 2.99s
Doing md2 for 3s on 256 size blocks: 7322 md2's in 3.00s
Doing md2 for 3s on 1024 size blocks: 2173 md2's in 2.89s
Doing md2 for 3s on 8192 size blocks: 304 md2's in 2.99s
Doing md4 for 3s on 16 size blocks: 115369 md4's in 3.00s
Doing md4 for 3s on 64 size blocks: 115723 md4's in 3.00s
Doing md4 for 3s on 256 size blocks: 88908 md4's in 2.99s
Doing md4 for 3s on 1024 size blocks: 48620 md4's in 2.98s
Doing md4 for 3s on 8192 size blocks: 10258 md4's in 2.99s
Doing md5 for 3s on 16 size blocks: 70799 md5's in 2.98s
Doing md5 for 3s on 64 size blocks: 69896 md5's in 2.98s
Doing md5 for 3s on 256 size blocks: 56259 md5's in 3.00s
Doing md5 for 3s on 1024 size blocks: 33143 md5's in 3.00s
Doing md5 for 3s on 8192 size blocks: 7914 md5's in 2.99s
Doing hmac(md5) for 3s on 16 size blocks: 190400 hmac(md5)'s in 2.98s
Doing hmac(md5) for 3s on 64 size blocks: 163136 hmac(md5)'s in 3.00s
Doing hmac(md5) for 3s on 256 size blocks: 111608 hmac(md5)'s in 2.98s
Doing hmac(md5) for 3s on 1024 size blocks: 51076 hmac(md5)'s in 2.99s
Doing hmac(md5) for 3s on 8192 size blocks: 9286 hmac(md5)'s in 2.99s
Doing sha1 for 3s on 16 size blocks: 56948 sha1's in 3.00s
Doing sha1 for 3s on 64 size blocks: 51206 sha1's in 3.00s
Doing sha1 for 3s on 256 size blocks: 36283 sha1's in 2.99s
Doing sha1 for 3s on 1024 size blocks: 18403 sha1's in 2.99s
Doing sha1 for 3s on 8192 size blocks: 3584 sha1's in 2.98s
Doing sha256 for 3s on 16 size blocks: 127496 sha256's in 3.00s
Doing sha256 for 3s on 64 size blocks: 76127 sha256's in 2.99s
Doing sha256 for 3s on 256 size blocks: 34048 sha256's in 3.00s
Doing sha256 for 3s on 1024 size blocks: 10828 sha256's in 2.99s
Doing sha256 for 3s on 8192 size blocks: 1524 sha256's in 2.99s
Doing sha512 for 3s on 16 size blocks: 7691 sha512's in 3.00s
Doing sha512 for 3s on 64 size blocks: 7654 sha512's in 2.99s
Doing sha512 for 3s on 256 size blocks: 2717 sha512's in 2.99s
Doing sha512 for 3s on 1024 size blocks: 926 sha512's in 2.98s
Doing sha512 for 3s on 8192 size blocks: 130 sha512's in 3.01s
Doing rmd160 for 3s on 16 size blocks: 45651 rmd160's in 2.99s
Doing rmd160 for 3s on 64 size blocks: 39666 rmd160's in 2.99s
Doing rmd160 for 3s on 256 size blocks: 28201 rmd160's in 2.99s
Doing rmd160 for 3s on 1024 size blocks: 13908 rmd160's in 3.00s
Doing rmd160 for 3s on 8192 size blocks: 2733 rmd160's in 2.98s
Doing rc4 for 3s on 16 size blocks: 2739344 rc4's in 2.99s
Doing rc4 for 3s on 64 size blocks: 783949 rc4's in 2.98s
Doing rc4 for 3s on 256 size blocks: 203269 rc4's in 2.98s
Doing rc4 for 3s on 1024 size blocks: 51473 rc4's in 2.99s
Doing rc4 for 3s on 8192 size blocks: 6374 rc4's in 2.98s
Doing des cbc for 3s on 16 size blocks: 546219 des cbc's in 3.00s
Doing des cbc for 3s on 64 size blocks: 149992 des cbc's in 2.98s
Doing des cbc for 3s on 256 size blocks: 38552 des cbc's in 3.00s
Doing des cbc for 3s on 1024 size blocks: 9844 des cbc's in 3.00s
Doing des cbc for 3s on 8192 size blocks: 1229 des cbc's in 2.99s
Doing des ede3 for 3s on 16 size blocks: 213445 des ede3's in 2.97s
Doing des ede3 for 3s on 64 size blocks: 55158 des ede3's in 2.97s
Doing des ede3 for 3s on 256 size blocks: 13904 des ede3's in 2.97s
Doing des ede3 for 3s on 1024 size blocks: 3227 des ede3's in 2.74s
Doing des ede3 for 3s on 8192 size blocks: 441 des ede3's in 2.99s
Doing aes-128 cbc for 3s on 16 size blocks: 595070 aes-128 cbc's in 2.97s
Doing aes-128 cbc for 3s on 64 size blocks: 163409 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 256 size blocks: 42375 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 1024 size blocks: 10665 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 8192 size blocks: 1338 aes-128 cbc's in 2.99s
Doing aes-192 cbc for 3s on 16 size blocks: 510290 aes-192 cbc's in 2.99s
Doing aes-192 cbc for 3s on 64 size blocks: 138844 aes-192 cbc's in 2.98s
Doing aes-192 cbc for 3s on 256 size blocks: 35894 aes-192 cbc's in 2.99s
Doing aes-192 cbc for 3s on 1024 size blocks: 9089 aes-192 cbc's in 3.00s
Doing aes-192 cbc for 3s on 8192 size blocks: 1132 aes-192 cbc's in 2.98s
Doing aes-256 cbc for 3s on 16 size blocks: 444002 aes-256 cbc's in 2.98s
Doing aes-256 cbc for 3s on 64 size blocks: 120882 aes-256 cbc's in 2.98s
Doing aes-256 cbc for 3s on 256 size blocks: 30963 aes-256 cbc's in 2.98s
Doing aes-256 cbc for 3s on 1024 size blocks: 7890 aes-256 cbc's in 2.99s
Doing aes-256 cbc for 3s on 8192 size blocks: 994 aes-256 cbc's in 2.98s
Doing aes-128 ige for 3s on 16 size blocks: 577263 aes-128 ige's in 2.99s
Doing aes-128 ige for 3s on 64 size blocks: 166651 aes-128 ige's in 2.98s
Doing aes-128 ige for 3s on 256 size blocks: 43055 aes-128 ige's in 2.98s
Doing aes-128 ige for 3s on 1024 size blocks: 10772 aes-128 ige's in 2.99s
Doing aes-128 ige for 3s on 8192 size blocks: 1306 aes-128 ige's in 2.99s
Doing aes-192 ige for 3s on 16 size blocks: 493664 aes-192 ige's in 2.99s
Doing aes-192 ige for 3s on 64 size blocks: 141065 aes-192 ige's in 2.99s
Doing aes-192 ige for 3s on 256 size blocks: 36340 aes-192 ige's in 2.99s
Doing aes-192 ige for 3s on 1024 size blocks: 9183 aes-192 ige's in 2.99s
Doing aes-192 ige for 3s on 8192 size blocks: 1108 aes-192 ige's in 2.99s
Doing aes-256 ige for 3s on 16 size blocks: 434801 aes-256 ige's in 2.98s
Doing aes-256 ige for 3s on 64 size blocks: 122980 aes-256 ige's in 2.99s
Doing aes-256 ige for 3s on 256 size blocks: 31594 aes-256 ige's in 2.99s
Doing aes-256 ige for 3s on 1024 size blocks: 7988 aes-256 ige's in 2.99s
Doing aes-256 ige for 3s on 8192 size blocks: 981 aes-256 ige's in 2.99s
Doing rc2 cbc for 3s on 16 size blocks: 525625 rc2 cbc's in 2.99s
Doing rc2 cbc for 3s on 64 size blocks: 140247 rc2 cbc's in 2.98s
Doing rc2 cbc for 3s on 256 size blocks: 35672 rc2 cbc's in 2.99s
Doing rc2 cbc for 3s on 1024 size blocks: 8987 rc2 cbc's in 2.99s
Doing rc2 cbc for 3s on 8192 size blocks: 1119 rc2 cbc's in 2.98s
Doing blowfish cbc for 3s on 16 size blocks: 1138316 blowfish cbc's in 2.99s
Doing blowfish cbc for 3s on 64 size blocks: 327400 blowfish cbc's in 2.99s
Doing blowfish cbc for 3s on 256 size blocks: 84685 blowfish cbc's in 2.99s
Doing blowfish cbc for 3s on 1024 size blocks: 21281 blowfish cbc's in 2.99s
Doing blowfish cbc for 3s on 8192 size blocks: 2606 blowfish cbc's in 2.98s
Doing cast cbc for 3s on 16 size blocks: 940793 cast cbc's in 2.97s
Doing cast cbc for 3s on 64 size blocks: 282189 cast cbc's in 3.00s
Doing cast cbc for 3s on 256 size blocks: 73868 cast cbc's in 2.98s
Doing cast cbc for 3s on 1024 size blocks: 18593 cast cbc's in 2.99s
Doing cast cbc for 3s on 8192 size blocks: 2285 cast cbc's in 2.99s
Doing 512 bit private rsa's for 10s: 726 512 bit private RSA's in 9.98s
Doing 512 bit public rsa's for 10s: 8359 512 bit public RSA's in 9.97s
Doing 1024 bit private rsa's for 10s: 158 1024 bit private RSA's in 10.03s
Doing 1024 bit public rsa's for 10s: 3643 1024 bit public RSA's in 9.99s
Doing 2048 bit private rsa's for 10s: 32 2048 bit private RSA's in 10.28s
Doing 2048 bit public rsa's for 10s: 1350 2048 bit public RSA's in 9.96s
Doing 4096 bit private rsa's for 10s: 6 4096 bit private RSA's in 10.83s
Doing 4096 bit public rsa's for 10s: 443 4096 bit public RSA's in 9.98s
Doing 512 bit sign dsa's for 10s: 852 512 bit DSA signs in 9.96s
Doing 512 bit verify dsa's for 10s: 734 512 bit DSA verify in 9.98s
Doing 1024 bit sign dsa's for 10s: 365 1024 bit DSA signs in 9.94s
Doing 1024 bit verify dsa's for 10s: 315 1024 bit DSA verify in 9.98s
Doing 2048 bit sign dsa's for 10s: 136 2048 bit DSA signs in 10.05s
Doing 2048 bit verify dsa's for 10s: 115 2048 bit DSA verify in 10.0