User:Uli/R-Car M3-W Mainline GPU Test

Results of running the GPU test programs shipped with the [Yocto223_OCL_20171129.zip https://www.renesas.com/en-us/solutions/automotive/rcar-opencl.html] binary library distribution on a renesas-drivers-2017-12-12-v4.15-rc3 kernel, using a module built from the master branch (commit b4b8d1938bafb4889b3c285978de5c14981309bc) of the GPU driver module at https://github.com/uli/r8a7796-gpu:

= Passing tests =

rgx_kicksync_test
rgx_kicksync_test test configuration: Verbose: N Num contexts: 2 Num syncs per context: 16 Num loops: 1 Sync first value: 0x00000001 Num syncs per command: 3 Num commands per loop: 32 Verify every command: N Delay (ms): 0 Random: N --- Start --- --- Loop 1 / 1 --- Initialising contexts Submitting 32 commands, each with 3 syncs All commands submitted Verifying all syncs have their expected final value All syncs have their expected value Releasing contexts Test successful --- End ---

pvr_memory_test
Profiling 4MB block-size transfer (512 times, 2048MB) using 4K-aligned (12) buffer Start tests (+ Memcpy)

Loop count = 512

CPU(ZeroPg)        -> CPU(ZeroPg)          started: Moving 512 blocks of 4194304 bytes = 2048Mbytes per test took 1.168463394 seconds = 1752.7Mbytes/sec

CPU(ZeroPg)        -> CPU(Cached)          started: Moving 512 blocks of 4194304 bytes = 2048Mbytes per test took 1.161539246 seconds = 1763.2Mbytes/sec

CPU(Cached)        -> CPU(Cached)          started: Moving 512 blocks of 4194304 bytes = 2048Mbytes per test took 2.44308258 seconds = 1001.8Mbytes/sec

CPU(Cached)        -> DEV(Cached)          started: Moving 512 blocks of 4194304 bytes = 2048Mbytes per test took 2.48023952 seconds = 1000.0Mbytes/sec

CPU(Cached)        -> DEV(Cached) + Flush  started: Moving 512 blocks of 4194304 bytes = 2048Mbytes per test took 2.671264205 seconds = 766.7Mbytes/sec

DEV(Cached)        -> CPU(Cached)          started: Moving 512 blocks of 4194304 bytes = 2048Mbytes per test took 2.34345457 seconds = 1006.7Mbytes/sec

CPU(Cached)        -> DEV(Uncached)        started: Moving 512 blocks of 4194304 bytes = 2048Mbytes per test took 3.235934264 seconds = 632.9Mbytes/sec

DEV(Uncached)      -> CPU(Cached)          started: Moving 512 blocks of 4194304 bytes = 2048Mbytes per test took 16.159380557 seconds = 126.7Mbytes/sec

CPU(Cached)        -> DEV(Write-Combined)  started: Moving 512 blocks of 4194304 bytes = 2048Mbytes per test took 1.838601274 seconds = 1113.9Mbytes/sec

DEV(Write-Combined) -> CPU(Cached)         started: Moving 512 blocks of 4194304 bytes = 2048Mbytes per test took 10.265888423 seconds = 199.5Mbytes/sec

- End tests -

Profiling 4MB block-size transfer (16 times, 64MB) using 4K-aligned (12) buffer Start tests (+ Memcpy)

Loop count = 16

CPU(ZeroPg)        -> CPU(ZeroPg)          started: Moving 16 blocks of 4194304 bytes = 64Mbytes per test took 0.43178572 seconds = 1482.2Mbytes/sec

CPU(ZeroPg)        -> CPU(Cached)          started: Moving 16 blocks of 4194304 bytes = 64Mbytes per test took 0.36431785 seconds = 1756.7Mbytes/sec

CPU(Cached)        -> CPU(Cached)          started: Moving 16 blocks of 4194304 bytes = 64Mbytes per test took 0.64104496 seconds = 998.4Mbytes/sec

CPU(Cached)        -> DEV(Cached)          started: Moving 16 blocks of 4194304 bytes = 64Mbytes per test took 0.64066577 seconds = 999.0Mbytes/sec

CPU(Cached)        -> DEV(Cached) + Flush  started: Moving 16 blocks of 4194304 bytes = 64Mbytes per test took 0.83345853 seconds = 767.9Mbytes/sec

DEV(Cached)        -> CPU(Cached)          started: Moving 16 blocks of 4194304 bytes = 64Mbytes per test took 0.65629823 seconds = 975.2Mbytes/sec

CPU(Cached)        -> DEV(Uncached)        started: Moving 16 blocks of 4194304 bytes = 64Mbytes per test took 0.104447218 seconds = 612.7Mbytes/sec

DEV(Uncached)      -> CPU(Cached)          started: Moving 16 blocks of 4194304 bytes = 64Mbytes per test took 0.506740307 seconds = 126.3Mbytes/sec

CPU(Cached)        -> DEV(Write-Combined)  started: Moving 16 blocks of 4194304 bytes = 64Mbytes per test took 0.57340429 seconds = 1116.1Mbytes/sec

DEV(Write-Combined) -> CPU(Cached)         started: Moving 16 blocks of 4194304 bytes = 64Mbytes per test took 0.321706407 seconds = 198.9Mbytes/sec

- End tests -

rgx_compute_test
-- RGX compute test - --- Start --- Call PVRSRVConnect with a valid argument: OK Attempt to create device memory context: OK Creating synchronization context: OK Looking up General heap handle OK Getting event object OK Creating Compute Context OK Creating Buffer Creating DWord for CDM Event Object OK OK Create PDS Heap OK Create USC Heap OK Allocate sync primitive OK Creating NOP instruction Creating Data Segment Creating Code Segment Write Kernel 0 Creating Fence Data Segment Creating Code Segment Write Fence Kernel Write Terminate Call services to kick CDM OK Poll for sync update OK Poll for CDM event object data OK Destroy Compute Context OK

Total time: 0ms Destroy synchronization context: Destroy Device Memory Context Disconnect from services: OK End

ocl_unit_test
OpenCL Unit Test(s) (Rogue_DDK_Linux rogueddk 1.7@4563938,release,r8a7795_linux) at Fri Oct 6 08:40:47 2017 00****************************************************************************** Platform Test: Checks that an OpenCL compatible platform is present for the unit test to run. Verify_Platform: Enumerating 1 platforms Verify_Platform: CL_PLATFORM_PROFILE   EMBEDDED_PROFILE Verify_Platform: CL_PLATFORM_VERSION   OpenCL 1.2 Verify_Platform: CL_PLATFORM_NAME      PowerVR Rogue Verify_Platform: CL_PLATFORM_VENDOR    Imagination Technologies Verify_Platform: CL_PLATFORM_EXTENSIONS 'cl_khr_icd cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_egl_image cl_img_yuv_image cles_khr_int64 cl_img_cached_allocations cl_img_use_gralloc_ptr cl_khr_spir' platform --> passed platform: Test took 0.22 seconds to run: platform: Verify 0.22s (100.00)% 01****************************************************************************** Device Test: Checks that an OpenCL compatible device is present for the unit test to run. Verify_Device: Enumerating 1 devices Verify_Device: CL_DEVICE_TYPE      CL_DEVICE_TYPE_GPU Verify_Device: CL_DEVICE_NAME      PowerVR Rogue GX6650 Verify_Device: CL_DEVICE_VENDORI   Imagination Technologies Verify_Device: CL_DRIVER_VERSION   1.7@4563938 Verify_Device: CL_DEVICE_PROFILE   EMBEDDED_PROFILE Verify_Device: CL_DEVICE_VERSION   OpenCL 1.2 Verify_Device: CL_DEVICE_EXTENSIONS cl_khr_icd cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_egl_image cl_img_yuv_image cles_khr_int64 cl_img_cached_allocations cl_img_use_gralloc_ptr cl_khr_spir device --> passed device: Test took 0.34 seconds to run: device: Verify 0.34s (100.00)% 02****************************************************************************** Bounding Box: Runs the kernel that compute floating point minimum and maximum for large number of vertices.

Objects:     100, # triangles per object:    15393, Computing instances per object:  512, Time - start - stop: 164.080000ms bbox --> passed bbox: Test took 2.87 seconds to run: bbox: Init 2.56s (89.19%) Verify 0.31s (10.81)% 03****************************************************************************** Addition Kernel: Performs an online compilation of an integer addition kernel and verifies that the output buffer is correct. Compute_Add: Online compilation test with 4096 instances running source: Verify_Add: Verification OK add --> passed add: Test took 0.09 seconds to run: add: Init 0.02s (21.98%) Compute 0.02s (23.08%)Verify 0.05s (54.95)% 04****************************************************************************** Binary Check: Runs the same kernel as the addition test however first saves the binary version to the filsystem, recreates the OpenCL context and ensures the binary test computes the same results as the online test. CL_PROGRAM_BUILD_LOG:

Init_Binary: Wrote out 1 binaries to the file system Compute_Binary: Successfully loaded back binary_0.bin from filesystem Verify_Binary: Binary file successfully executed kernel binary --> passed binary: Test took 0.18 seconds to run: binary: Init 0.09s (46.70%) Compute 0.04s (19.23%)Verify 0.06s (34.07)% 05****************************************************************************** Error Log Check: Runs an illegal kernel that contains a simple undeclared identifier error and verifies that the build log provided by the OpenCL implementation is correct. errorlog: BuildGroup_1:3:2: error: use of undeclared identifier 'undeclared' undeclared identifier ^ errorlog: Verified build log contained required error. errorlog --> passed errorlog: Test took 0.09 seconds to run: errorlog: Init 0.01s (10.00%) Verify 0.08s (90.00)% 06****************************************************************************** Memory Copy Kernel: Performs an online compilation of a kernel which copies input to output, verifying the results and calculating the speed at which the data is transferred. BufferType:   uchar, Instances:   524288, Copies Per Instance:   32, Copied:   16 MBs, Time:   0.068670s, 232.998398MB/s BufferType:  ushort, Instances:   262144, Copies Per Instance:   32, Copied:   16 MBs, Time:   0.128205s, 124.800125MB/s BufferType:    uint, Instances:   131072, Copies Per Instance:   32, Copied:   16 MBs, Time:   0.026237s, 609.825819MB/s BufferType:   uint2, Instances:    65536, Copies Per Instance:   32, Copied:   16 MBs, Time:   0.092728s, 172.547666MB/s BufferType:   uint4, Instances:    32768, Copies Per Instance:   32, Copied:   16 MBs, Time:   0.023522s, 680.214267MB/s BufferType:   uint8, Instances:    16384, Copies Per Instance:   32, Copied:   16 MBs, Time:   0.025302s, 632.361078MB/s BufferType:  uint16, Instances:     8192, Copies Per Instance:   32, Copied:   16 MBs, Time:   0.015503s, 1032.058311MB/s memcpy --> passed memcpy: Test took 1.90 seconds to run: memcpy: Init 0.08s (3.95%) Verify 1.82s (96.05)% 07****************************************************************************** Memory Strided Copy Kernel: Performs an online compilation of a kernel which copies input to output, verifying the results and calculating the speed at which the data is transferred using strided pattern. BufferType:   uchar, Instances:   524288, Copies Per Instance:   32, Copied:   16 MBs, Workgroup Size: 256, Time:   0.057483s, 278.343162MB/s BufferType:  ushort, Instances:   262144, Copies Per Instance:   32, Copied:   16 MBs, Workgroup Size: 128, Time:   0.032100s, 498.442368MB/s BufferType:    uint, Instances:   131072, Copies Per Instance:   32, Copied:   16 MBs, Workgroup Size: 64, Time:   0.013142s, 1217.470705MB/s BufferType:   uint2, Instances:    65536, Copies Per Instance:   32, Copied:   16 MBs, Workgroup Size: 32, Time:   0.013185s, 1213.500190MB/s BufferType:   uint4, Instances:    32768, Copies Per Instance:   32, Copied:   16 MBs, Workgroup Size: 16, Time:   0.012212s, 1310.186702MB/s BufferType:   uint8, Instances:    16384, Copies Per Instance:   32, Copied:   16 MBs, Workgroup Size: 8, Time:   0.012611s, 1268.733645MB/s BufferType:  uint16, Instances:     8192, Copies Per Instance:   32, Copied:   16 MBs, Workgroup Size: 4, Time:   0.012611s, 1268.733645MB/s memcpy_stride --> passed memcpy_stride: Test took 1.42 seconds to run: memcpy_stride: Init 0.06s (4.03%) Verify 1.36s (95.97)% 08****************************************************************************** Memory Read Kernel: Performs an online compilation of a kernel which reads large amounts of data each instance, calculating read bandwidth. Performing 32 memory reads per kernel instance, totalling 16777216 memory loads... BufferType:   uchar, Instances:   524288, Reads per instance:  128, Read:   64 MBs, Time:   0.157612s, 406.060452MB/s Performing 32 memory reads per kernel instance, totalling 8388608 memory loads... BufferType:  ushort, Instances:   262144, Reads per instance:  128, Read:   64 MBs, Time:   0.635605s, 100.691467MB/s Performing 32 memory reads per kernel instance, totalling 4194304 memory loads... BufferType:    uint, Instances:   131072, Reads per instance:  128, Read:   64 MBs, Time:   0.039797s, 1608.161419MB/s Performing 32 memory reads per kernel instance, totalling 2097152 memory loads... BufferType:   uint2, Instances:    65536, Reads per instance:  128, Read:   64 MBs, Time:   0.037831s, 1691.734292MB/s Performing 32 memory reads per kernel instance, totalling 1048576 memory loads... BufferType:   uint4, Instances:    32768, Reads per instance:  128, Read:   64 MBs, Time:   0.037738s, 1695.903334MB/s Performing 32 memory reads per kernel instance, totalling 524288 memory loads... BufferType:   uint8, Instances:    16384, Reads per instance:  128, Read:   64 MBs, Time:   0.038040s, 1682.439537MB/s Performing 32 memory reads per kernel instance, totalling 262144 memory loads... BufferType:  uint16, Instances:     8192, Reads per instance:  128, Read:   64 MBs, Time:   0.020294s, 3153.641470MB/s memread --> passed memread: Test took 2.52 seconds to run: memread: Init 0.12s (4.64%) Verify 2.40s (95.36)% 09****************************************************************************** Memory Strided Read Kernel: Performs an online compilation of a kernel which reads large amounts of data each instance using stride pattern, calculating read bandwidth. Performing 32 memory reads per kernel instance, totalling 16777216 memory loads... BufferType:   uchar, Instances:   524288, Reads per instance:  128, WG size 256, Read:   64 MBs, Time:   0.077018s, 830.974577MB/s Performing 32 memory reads per kernel instance, totalling 8388608 memory loads... BufferType:  ushort, Instances:   262144, Reads per instance:  128, WG size 128, Read:   64 MBs, Time:   0.046774s, 1368.281524MB/s Performing 32 memory reads per kernel instance, totalling 4194304 memory loads... BufferType:    uint, Instances:   131072, Reads per instance:  128, WG size 64, Read:   64 MBs, Time:   0.020116s, 3181.547027MB/s Performing 32 memory reads per kernel instance, totalling 2097152 memory loads... BufferType:   uint2, Instances:    65536, Reads per instance:  128, WG size 32, Read:   64 MBs, Time:   0.019099s, 3350.960783MB/s Performing 32 memory reads per kernel instance, totalling 1048576 memory loads... BufferType:   uint4, Instances:    32768, Reads per instance:  128, WG size 16, Read:   64 MBs, Time:   0.019511s, 3280.200912MB/s Performing 32 memory reads per kernel instance, totalling 524288 memory loads... BufferType:   uint8, Instances:    16384, Reads per instance:  128, WG size 8, Read:   64 MBs, Time:   0.019315s, 3313.486927MB/s Performing 32 memory reads per kernel instance, totalling 262144 memory loads... BufferType:  uint16, Instances:     8192, Reads per instance:  128, WG size 4, Read:   64 MBs, Time:   0.018088s, 3538.257408MB/s memread_stride --> passed memread_stride: Test took 1.39 seconds to run: memread_stride: Init 0.12s (8.59%) Verify 1.27s (91.41)% 10****************************************************************************** Memory Write Kernel: Performs an online compilation of a kernel which writes large amounts of data each instance, calculating write bandwidth. BufferType:   uchar, Instances:   524288, Writes Per Instance:  128, Written:   64 MBs, Time:   0.278133s, 230.105741MB/s BufferType:  ushort, Instances:   262144, Writes Per Instance:  128, Written:   64 MBs, Time:   0.994779s,  64.335898MB/s BufferType:    uint, Instances:   131072, Writes Per Instance:  128, Written:   64 MBs, Time:   0.749035s,  85.443270MB/s BufferType:   uint2, Instances:    65536, Writes Per Instance:  128, Written:   64 MBs, Time:   0.272591s, 234.783980MB/s BufferType:   uint4, Instances:    32768, Writes Per Instance:  128, Written:   64 MBs, Time:   0.146211s, 437.723564MB/s BufferType:   uint8, Instances:    16384, Writes Per Instance:  128, Written:   64 MBs, Time:   0.080468s, 795.347219MB/s BufferType:  uint16, Instances:     8192, Writes Per Instance:  128, Written:   64 MBs, Time:   0.045449s, 1408.171797MB/s memwrite --> passed memwrite: Test took 10.17 seconds to run: memwrite: Init 0.64s (6.25%) Verify 9.53s (93.75)% 11****************************************************************************** Memory Strided Write Kernel: Performs an online compilation of a kernel which writes large amounts of data each instance using stride pattern, calculating write bandwidth. BufferType:   uchar, Instances:   524288, Writes Per Instance:  128, Written:   64 MBs, Time:   0.126440s, 506.168934MB/s BufferType:  ushort, Instances:   262144, Writes Per Instance:  128, Written:   64 MBs, Time:   0.074213s, 862.382601MB/s BufferType:    uint, Instances:   131072, Writes Per Instance:  128, Written:   64 MBs, Time:   0.048326s, 1324.338865MB/s BufferType:   uint2, Instances:    65536, Writes Per Instance:  128, Written:   64 MBs, Time:   0.030605s, 2091.161575MB/s BufferType:   uint4, Instances:    32768, Writes Per Instance:  128, Written:   64 MBs, Time:   0.031541s, 2029.104974MB/s BufferType:   uint8, Instances:    16384, Writes Per Instance:  128, Written:   64 MBs, Time:   0.030891s, 2071.800848MB/s BufferType:  uint16, Instances:     8192, Writes Per Instance:  128, Written:   64 MBs, Time:   0.032031s, 1998.064375MB/s memwrite_stride --> passed memwrite_stride: Test took 8.61 seconds to run: memwrite_stride: Init 0.95s (10.99%) Verify 7.67s (89.01)% 12****************************************************************************** Image Copy Kernel: Performs an image direct copy. Init_ImgCopyKernel: Hardware detected, verification will be run. ImgCpy_RunImgCpy: Instances: 128x512, Iterations:   8,  Copied   2.00 MBs, Time 0.006000s,     6 Ticks, 333.333333MB/s ImgCpy_RunImgCpy: Instances: 128x512, Iterations:  16,  Copied   4.00 MBs, Time 0.007000s,     7 Ticks, 571.428571MB/s ImgCpy_RunImgCpy: Instances: 128x512, Iterations:  32,  Copied   8.00 MBs, Time 0.014000s,    14 Ticks, 571.428571MB/s ImgCpy_RunImgCpy: Instances: 128x512, Iterations:  64,  Copied  16.00 MBs, Time 0.027000s,    27 Ticks, 592.592593MB/s ImgCpy_RunImgCpy: Instances: 128x512, Iterations: 128,  Copied  32.00 MBs, Time 0.051000s,    51 Ticks, 627.450980MB/s ImgCpy_RunImgCpy: Instances: 128x512, Iterations: 256,  Copied  64.00 MBs, Time 0.102000s,   102 Ticks, 627.450980MB/s ImgCpy_RunImgCpy: Instances: 128x512, Iterations: 512,  Copied 128.00 MBs, Time 0.203000s,   203 Ticks, 630.541872MB/s ImgCpy_RunImgCpy: Instances: 128x512, Iterations: 1024, Copied 256.00 MBs, Time 0.406000s,   406 Ticks, 630.541872MB/s ImgCpy_RunImgCpy: Instances: 512x512, Iterations:   8,  Copied   2.00 MBs, Time 0.008000s,     8 Ticks, 250.000000MB/s ImgCpy_RunImgCpy: Instances: 512x512, Iterations:  16,  Copied   4.00 MBs, Time 0.011000s,    11 Ticks, 363.636364MB/s ImgCpy_RunImgCpy: Instances: 512x512, Iterations:  32,  Copied   8.00 MBs, Time 0.024000s,    24 Ticks, 333.333333MB/s ImgCpy_RunImgCpy: Instances: 512x512, Iterations:  64,  Copied  16.00 MBs, Time 0.045000s,    45 Ticks, 355.555556MB/s ImgCpy_RunImgCpy: Instances: 512x512, Iterations: 128,  Copied  32.00 MBs, Time 0.090000s,    90 Ticks, 355.555556MB/s ImgCpy_RunImgCpy: Instances: 512x512, Iterations: 256,  Copied  64.00 MBs, Time 0.180000s,   180 Ticks, 355.555556MB/s ImgCpy_RunImgCpy: Instances: 512x512, Iterations: 512,  Copied 128.00 MBs, Time 0.359000s,   359 Ticks, 356.545961MB/s ImgCpy_RunImgCpy: Instances: 512x512, Iterations: 1024, Copied 256.00 MBs, Time 0.717000s,   717 Ticks, 357.043236MB/s Verify_ImgCopyKernel: Average speed 453.33 MB/s imgcpy --> passed imgcpy: Test took 3.10 seconds to run: imgcpy: Init 0.16s (5.22%) Verify 2.94s (94.78)% 13****************************************************************************** Image mipmaps: Performs an image direct copy with mipmaps. Verify_MipMap: Average speed 0.00 MB/s mipmap --> passed mipmap: Test took 0.10 seconds to run: mipmap: Init 0.01s (10.00%) Verify 0.09s (90.00)% 14****************************************************************************** Floating Point Operations Kernel (Scalar): Performs an online compilation of floating point add/mul/mad/div kernels which each perform a large number of one specific operation, and calculates the floating point operations per second of the device. This test uses (scalar) float as its unit of computation. Compiling Add kernel... Compiling Mul kernel... Compiling Mad kernel... Compiling Div kernel... Compute_Float: Online compilation test with 524288 instances, each with 300 iterations Running Add kernel... Running Mul kernel... Running Mad kernel... Running Div kernel... --- Float Add: --- Verify_Float: Time                     0.410148s Verify_Float: Iterations               300 Verify_Float: Instances                524288 Verify_Float: Floating point operations 15417212928 Verify_Float: GFlop                    15.417213 Verify_Float: GFLOP/S                  37.589390 --- Float Mul: --- Verify_Float: Time                     0.410114s Verify_Float: Iterations               300 Verify_Float: Instances                524288 Verify_Float: Floating point operations 15416164352 Verify_Float: GFlop                    15.416164 Verify_Float: GFLOP/S                  37.589949 --- Float Mad: --- Verify_Float: Time                     0.410401s Verify_Float: Iterations               300 Verify_Float: Instances                524288 Verify_Float: Floating point operations 30830231552 Verify_Float: GFlop                    30.830232 Verify_Float: GFLOP/S                  75.122214 --- Float Div: --- Verify_Float: Time                     0.820315s Verify_Float: Iterations               300 Verify_Float: Instances                524288 Verify_Float: Floating point operations 15416164352 Verify_Float: GFlop                    15.416164 Verify_Float: GFLOP/S                  18.792981 --- Verify_Float: All tests time                     2.175840s --- float --> passed float: Test took 5.82 seconds to run: float: Init 2.16s (37.04%) Compute 2.28s (39.10%)Verify 1.39s (23.87)% 15****************************************************************************** Floating Point Operations Kernel (Scalar): Performs an online compilation of floating point add/mul/mad/div kernels which each perform a large number of one specific operation, and calculates the floating point operations per second of the device. This test uses (scalar) float as its unit of computation. Kernels run in parallel without synchronization on host side. Compiling Add kernel... Compiling Mul kernel... Compiling Mad kernel... Compiling Div kernel... Compute_Float_Parallel: Online compilation test with 524288 instances, each with 300 iterations Running Add kernel... Running Mul kernel... Running Mad kernel... Running Div kernel... --- Float Add: --- Verify_Float: Time                     0.410296s Verify_Float: Iterations               300 Verify_Float: Instances                524288 Verify_Float: Floating point operations 15417212928 Verify_Float: GFlop                    15.417213 Verify_Float: GFLOP/S                  37.575830 --- Float Mul: --- Verify_Float: Time                     0.409929s Verify_Float: Iterations               300 Verify_Float: Instances                524288 Verify_Float: Floating point operations 15416164352 Verify_Float: GFlop                    15.416164 Verify_Float: GFLOP/S                  37.606913 --- Float Mad: --- Verify_Float: Time                     0.409919s Verify_Float: Iterations               300 Verify_Float: Instances                524288 Verify_Float: Floating point operations 30830231552 Verify_Float: GFlop                    30.830232 Verify_Float: GFLOP/S                  75.210545 --- Float Div: --- Verify_Float: Time                     0.820163s Verify_Float: Iterations               300 Verify_Float: Instances                524288 Verify_Float: Floating point operations 15416164352 Verify_Float: GFlop                    15.416164 Verify_Float: GFLOP/S                  18.796464 --- Verify_Float: All tests time                     2.193818s --- float_parallel --> passed float_parallel: Test took 4.07 seconds to run: float_parallel: Init 0.22s (5.48%) Compute 2.29s (56.37%)Verify 1.55s (38.15)% 16****************************************************************************** Floating Point Operations Kernel (Scalar): Performs an online compilation of floating point add/mul/mad/div kernels which each perform a large number of one specific operation, and calculates the floating point operations per second of the device. This test uses (scalar) float as its unit of computation. Kernels run in order of their dependencies created by events. Compiling Add kernel... Compiling Mul kernel... Compiling Mad kernel... Compiling Div kernel... Compute_Float_Chain: Online compilation test with 524288 instances, each with 300 iterations Running Add kernel... Running Mul kernel... Running Mad kernel... Running Div kernel... --- Float Add: --- Verify_Float: Time                     0.409995s Verify_Float: Iterations               300 Verify_Float: Instances                524288 Verify_Float: Floating point operations 15417212928 Verify_Float: GFlop                    15.417213 Verify_Float: GFLOP/S                  37.603417 --- Float Mul: --- Verify_Float: Time                     0.409922s Verify_Float: Iterations               300 Verify_Float: Instances                524288 Verify_Float: Floating point operations 15416164352 Verify_Float: GFlop                    15.416164 Verify_Float: GFLOP/S                  37.607555 --- Float Mad: --- Verify_Float: Time                     0.409930s Verify_Float: Iterations               300 Verify_Float: Instances                524288 Verify_Float: Floating point operations 30830231552 Verify_Float: GFlop                    30.830232 Verify_Float: GFLOP/S                  75.208527 --- Float Div: --- Verify_Float: Time                     0.820185s Verify_Float: Iterations               300 Verify_Float: Instances                524288 Verify_Float: Floating point operations 15416164352 Verify_Float: GFlop                    15.416164 Verify_Float: GFLOP/S                  18.795960 --- Verify_Float: All tests time                     2.173687s --- float_chain --> passed float_chain: Test took 4.19 seconds to run: float_chain: Init 0.14s (3.39%) Compute 2.25s (53.81%)Verify 1.79s (42.80)% 17****************************************************************************** Floating Point Operations Kernels for all vector sizes: Performs an online compilation of floating point add/mul/mad/div kernels which each perform a large number of one specific operation, and calculates the floating point operations per second of the device. This test uses float{1,2,3,4,8,16} as its unit of computation. Compiling Add float kernel... Compiling Mul float kernel... Compiling Mad float kernel... Compiling Div float kernel... Init_Floatops: Reducing from 65536 to 65536 due to use of float in kernel. Compiling Add float2 kernel... Compiling Mul float2 kernel... Compiling Mad float2 kernel... Compiling Div float2 kernel... Init_Floatops: Reducing from 65536 to 65536 due to use of float2 in kernel. Compiling Add float3 kernel... Compiling Mul float3 kernel... Compiling Mad float3 kernel... Compiling Div float3 kernel... Init_Floatops: Reducing from 65536 to 65536 due to use of float3 in kernel. Compiling Add float4 kernel... Compiling Mul float4 kernel... Compiling Mad float4 kernel... Compiling Div float4 kernel... Init_Floatops: Reducing from 65536 to 65536 due to use of float4 in kernel. Compiling Add float8 kernel... Compiling Mul float8 kernel... Compiling Mad float8 kernel... Compiling Div float8 kernel... Init_Floatops: Reducing from 65536 to 65536 due to use of float8 in kernel. Compiling Add float16 kernel... Compiling Mul float16 kernel... Compiling Mad float16 kernel... Compiling Div float16 kernel... Init_Floatops: Reducing from 65536 to 65536 due to use of float16 in kernel. Compute_Floatops: Online compilation test with float using 65536 instances, each with 400 iterations Running Add kernel for float... Running Mul kernel for float... Running Mad kernel for float... Running Div kernel for float... Compute_Floatops: Online compilation test with float2 using 65536 instances, each with 400 iterations Running Add kernel for float2... Running Mul kernel for float2... Running Mad kernel for float2... Running Div kernel for float2... Compute_Floatops: Online compilation test with float3 using 65536 instances, each with 400 iterations Running Add kernel for float3... Running Mul kernel for float3... Running Mad kernel for float3... Running Div kernel for float3... Compute_Floatops: Online compilation test with float4 using 65536 instances, each with 400 iterations Running Add kernel for float4... Running Mul kernel for float4... Running Mad kernel for float4... Running Div kernel for float4... Compute_Floatops: Online compilation test with float8 using 65536 instances, each with 400 iterations Running Add kernel for float8... Running Mul kernel for float8... Running Mad kernel for float8... Running Div kernel for float8... Compute_Floatops: Online compilation test with float16 using 65536 instances, each with 400 iterations Running Add kernel for float16... Running Mul kernel for float16... Running Mad kernel for float16... Running Div kernel for float16... --- float Add: --- Verify_Floatops: Time                     0.068629s Verify_Floatops: Iterations               400 Verify_Floatops: Instances                65536 Verify_Floatops: Float operations 2569404416 Verify_Floatops: GFLOP                    2.569404 Verify_Floatops: GFLOP/S                  37.439048 --- float Mul: --- Verify_Floatops: Time                     0.068905s Verify_Floatops: Iterations               400 Verify_Floatops: Instances                65536 Verify_Floatops: Float operations 2569273344 Verify_Floatops: GFLOP                    2.569273 Verify_Floatops: GFLOP/S                  37.287183 --- float Mad: --- Verify_Floatops: Time                     0.070338s Verify_Floatops: Iterations               400 Verify_Floatops: Instances                65536 Verify_Floatops: Float operations 5138284544 Verify_Floatops: GFLOP                    5.138285 Verify_Floatops: GFLOP/S                  73.051331 --- float Div: --- Verify_Floatops: Time                     0.136826s Verify_Floatops: Iterations               400 Verify_Floatops: Instances                65536 Verify_Floatops: Float operations 2569273344 Verify_Floatops: GFLOP                    2.569273 Verify_Floatops: GFLOP/S                  18.777669
 * Build Log ***

--- float2 Add: --- Verify_Floatops: Time                     0.135482s Verify_Floatops: Iterations               400 Verify_Floatops: Instances                65536 Verify_Floatops: Float operations 5138808832 Verify_Floatops: GFLOP                    5.138809 Verify_Floatops: GFLOP/S                  37.929827 --- float2 Mul: --- Verify_Floatops: Time                     0.135580s Verify_Floatops: Iterations               400 Verify_Floatops: Instances                65536 Verify_Floatops: Float operations 5138546688 Verify_Floatops: GFLOP                    5.138547 Verify_Floatops: GFLOP/S                  37.900477 --- float2 Mad: --- Verify_Floatops: Time                     0.135984s Verify_Floatops: Iterations               400 Verify_Floatops: Instances                65536 Verify_Floatops: Float operations 10276569088 Verify_Floatops: GFLOP                    10.276569 Verify_Floatops: GFLOP/S                  75.571899 --- float2 Div: --- Verify_Floatops: Time                     0.272034s Verify_Floatops: Iterations               400 Verify_Floatops: Instances                65536 Verify_Floatops: Float operations 5138546688 Verify_Floatops: GFLOP                    5.138547 Verify_Floatops: GFLOP/S                  18.889355

--- float3 Add: --- Verify_Floatops: Time                     0.210914s Verify_Floatops: Iterations               400 Verify_Floatops: Instances                65536 Verify_Floatops: Float operations 7708213248 Verify_Floatops: GFLOP                    7.708213 Verify_Floatops: GFLOP/S                  36.546712 --- float3 Mul: --- Verify_Floatops: Time                     0.203259s Verify_Floatops: Iterations               400 Verify_Floatops: Instances                65536 Verify_Floatops: Float operations 7707820032 Verify_Floatops: GFLOP                    7.707820 Verify_Floatops: GFLOP/S                  37.921175 --- float3 Mad: --- Verify_Floatops: Time                     0.203220s Verify_Floatops: Iterations               400 Verify_Floatops: Instances                65536 Verify_Floatops: Float operations 15414853632 Verify_Floatops: GFLOP                    15.414854 Verify_Floatops: GFLOP/S                  75.853034 --- float3 Div: --- Verify_Floatops: Time                     0.408912s Verify_Floatops: Iterations               400 Verify_Floatops: Instances                65536 Verify_Floatops: Float operations 7707820032 Verify_Floatops: GFLOP                    7.707820 Verify_Floatops: GFLOP/S                  18.849581

--- float4 Add: --- Verify_Floatops: Time                     0.279024s Verify_Floatops: Iterations               400 Verify_Floatops: Instances                65536 Verify_Floatops: Float operations 10277617664 Verify_Floatops: GFLOP                    10.277618 Verify_Floatops: GFLOP/S                  36.834171 --- float4 Mul: --- Verify_Floatops: Time                     0.270386s Verify_Floatops: Iterations               400 Verify_Floatops: Instances                65536 Verify_Floatops: Float operations 10277093376 Verify_Floatops: GFLOP                    10.277093 Verify_Floatops: GFLOP/S                  38.008970 --- float4 Mad: --- Verify_Floatops: Time                     0.270518s Verify_Floatops: Iterations               400 Verify_Floatops: Instances                65536 Verify_Floatops: Float operations 20553138176 Verify_Floatops: GFLOP                    20.553138 Verify_Floatops: GFLOP/S                  75.976971 --- float4 Div: --- Verify_Floatops: Time                     0.543566s Verify_Floatops: Iterations               400 Verify_Floatops: Instances                65536 Verify_Floatops: Float operations 10277093376 Verify_Floatops: GFLOP                    10.277093 Verify_Floatops: GFLOP/S                  18.906799

--- float8 Add: --- Verify_Floatops: Time                     0.600531s Verify_Floatops: Iterations               400 Verify_Floatops: Instances                65536 Verify_Floatops: Float operations 20555235328 Verify_Floatops: GFLOP                    20.555235 Verify_Floatops: GFLOP/S                  34.228433 --- float8 Mul: --- Verify_Floatops: Time                     0.538501s Verify_Floatops: Iterations               400 Verify_Floatops: Instances                65536 Verify_Floatops: Float operations 20554186752 Verify_Floatops: GFLOP                    20.554187 Verify_Floatops: GFLOP/S                  38.169264 --- float8 Mad: --- Verify_Floatops: Time                     0.543278s Verify_Floatops: Iterations               400 Verify_Floatops: Instances                65536 Verify_Floatops: Float operations 41106276352 Verify_Floatops: GFLOP                    41.106276 Verify_Floatops: GFLOP/S                  75.663429 --- float8 Div: --- Verify_Floatops: Time                     1.091137s Verify_Floatops: Iterations               400 Verify_Floatops: Instances                65536 Verify_Floatops: Float operations 20554186752 Verify_Floatops: GFLOP                    20.554187 Verify_Floatops: GFLOP/S                  18.837402

--- float16 Add: --- Verify_Floatops: Time                     1.228492s Verify_Floatops: Iterations               400 Verify_Floatops: Instances                65536 Verify_Floatops: Float operations 41110470656 Verify_Floatops: GFLOP                    41.110471 Verify_Floatops: GFLOP/S                  33.464174 --- float16 Mul: --- Verify_Floatops: Time                     1.075268s Verify_Floatops: Iterations               400 Verify_Floatops: Instances                65536 Verify_Floatops: Float operations 41108373504 Verify_Floatops: GFLOP                    41.108374 Verify_Floatops: GFLOP/S                  38.230816 --- float16 Mad: --- Verify_Floatops: Time                     1.088152s Verify_Floatops: Iterations               400 Verify_Floatops: Instances                65536 Verify_Floatops: Float operations 82212552704 Verify_Floatops: GFLOP                    82.212553 Verify_Floatops: GFLOP/S                  75.552453 --- float16 Div: --- Verify_Floatops: Time                     2.190075s Verify_Floatops: Iterations               400 Verify_Floatops: Instances                65536 Verify_Floatops: Float operations 41108373504 Verify_Floatops: GFLOP                    41.108374 Verify_Floatops: GFLOP/S                  18.770304

floatvec --> passed floatvec: Test took 42.24 seconds to run: floatvec: Init 19.01s (44.99%) Compute 13.70s (32.44%)Verify 9.53s (22.56)% 18****************************************************************************** Integer Operations Kernels: Performs an online compilation of integer add/mul/mad/div kernels which each perform a large number of one specific operation, and calculates the operations per second of the device. This test uses int{1,2,3,4} as its unit of computation. Compiling Add int kernel... Compiling Mul int kernel... Compiling Mad int kernel... Compiling Div int kernel... Compiling Add int2 kernel... Compiling Mul int2 kernel... Compiling Mad int2 kernel... Compiling Div int2 kernel... Compiling Add int3 kernel... Compiling Mul int3 kernel... Compiling Mad int3 kernel... Compiling Div int3 kernel... Compiling Add int4 kernel... Compiling Mul int4 kernel... Compiling Mad int4 kernel... Compiling Div int4 kernel... Compute_Int: Online compilation test with int using 65536 instances, each with 400 iterations Running Add kernel for int... Running Mul kernel for int... Running Mad kernel for int... Running Div kernel for int... Compute_Int: Online compilation test with int2 using 65536 instances, each with 400 iterations Running Add kernel for int2... Running Mul kernel for int2... Running Mad kernel for int2... Running Div kernel for int2... Compute_Int: Online compilation test with int3 using 65536 instances, each with 400 iterations Running Add kernel for int3... Running Mul kernel for int3... Running Mad kernel for int3... Running Div kernel for int3... Compute_Int: Online compilation test with int4 using 65536 instances, each with 400 iterations Running Add kernel for int4... Running Mul kernel for int4... Running Mad kernel for int4... Running Div kernel for int4... --- int Add: --- Verify_Int: Time                     0.048407s Verify_Int: Iterations               400 Verify_Int: Instances                65536 Verify_Int: Integer operations 996540416 Verify_Int: GIOp                    0.996540 Verify_Int: GIOP/S                  20.586701 --- int Mul: --- Verify_Int: Time                     0.044155s Verify_Int: Iterations               400 Verify_Int: Instances                65536 Verify_Int: Integer operations 996409344 Verify_Int: GIOp                    0.996409 Verify_Int: GIOP/S                  22.566172 --- int Mad: --- Verify_Int: Time                     0.056313s Verify_Int: Iterations               400 Verify_Int: Instances                65536 Verify_Int: Integer operations 2097414144 Verify_Int: GIOp                    2.097414 Verify_Int: GIOP/S                  37.245647 --- int Div: --- Verify_Int: Time                     3.138706s Verify_Int: Iterations               400 Verify_Int: Instances                65536 Verify_Int: Integer operations 996409344 Verify_Int: GIOp                    0.996409 Verify_Int: GIOP/S                  0.317459

--- int2 Add: --- Verify_Int: Time                     0.094852s Verify_Int: Iterations               400 Verify_Int: Instances                65536 Verify_Int: Integer operations 1993080832 Verify_Int: GIOp                    1.993081 Verify_Int: GIOP/S                  21.012534 --- int2 Mul: --- Verify_Int: Time                     0.086736s Verify_Int: Iterations               400 Verify_Int: Instances                65536 Verify_Int: Integer operations 1992818688 Verify_Int: GIOp                    1.992819 Verify_Int: GIOP/S                  22.975681 --- int2 Mad: --- Verify_Int: Time                     0.110893s Verify_Int: Iterations               400 Verify_Int: Instances                65536 Verify_Int: Integer operations 4194828288 Verify_Int: GIOp                    4.194828 Verify_Int: GIOP/S                  37.827710 --- int2 Div: --- Verify_Int: Time                     6.437046s Verify_Int: Iterations               400 Verify_Int: Instances                65536 Verify_Int: Integer operations 1992818688 Verify_Int: GIOp                    1.992819 Verify_Int: GIOP/S                  0.309586

--- int3 Add: --- Verify_Int: Time                     0.141097s Verify_Int: Iterations               400 Verify_Int: Instances                65536 Verify_Int: Integer operations 2989621248 Verify_Int: GIOp                    2.989621 Verify_Int: GIOP/S                  21.188411 --- int3 Mul: --- Verify_Int: Time                     0.128773s Verify_Int: Iterations               400 Verify_Int: Instances                65536 Verify_Int: Integer operations 2989228032 Verify_Int: GIOp                    2.989228 Verify_Int: GIOP/S                  23.213158 --- int3 Mad: --- Verify_Int: Time                     0.166180s Verify_Int: Iterations               400 Verify_Int: Instances                65536 Verify_Int: Integer operations 6292242432 Verify_Int: GIOp                    6.292242 Verify_Int: GIOP/S                  37.864018 --- int3 Div: --- Verify_Int: Time                     9.671746s Verify_Int: Iterations               400 Verify_Int: Instances                65536 Verify_Int: Integer operations 2989228032 Verify_Int: GIOp                    2.989228 Verify_Int: GIOP/S                  0.309068

--- int4 Add: --- Verify_Int: Time                     0.187724s Verify_Int: Iterations               400 Verify_Int: Instances                65536 Verify_Int: Integer operations 3986161664 Verify_Int: GIOp                    3.986162 Verify_Int: GIOP/S                  21.234161 --- int4 Mul: --- Verify_Int: Time                     0.171712s Verify_Int: Iterations               400 Verify_Int: Instances                65536 Verify_Int: Integer operations 3985637376 Verify_Int: GIOp                    3.985637 Verify_Int: GIOP/S                  23.211176 --- int4 Mad: --- Verify_Int: Time                     0.220862s Verify_Int: Iterations               400 Verify_Int: Instances                65536 Verify_Int: Integer operations 8389656576 Verify_Int: GIOp                    8.389657 Verify_Int: GIOP/S                  37.985967 --- int4 Div: --- Verify_Int: Time                     12.905239s Verify_Int: Iterations               400 Verify_Int: Instances                65536 Verify_Int: Integer operations 3985637376 Verify_Int: GIOp                    3.985637 Verify_Int: GIOP/S                  0.308839

int --> passed int: Test took 76.98 seconds to run: int: Init 37.45s (48.65%) Compute 34.50s (44.81%)Verify 5.03s (6.53)% 19****************************************************************************** Integer Operations Kernels: Performs an online compilation of integer modulo kernels which each perform a large number of one specific operation, and calculates the operations per second of the device. This test uses int{1,2,3,4} as its unit of computation. Compiling Div uint kernel... Compiling Div Fast uint kernel... Compiling Div uint2 kernel... Compiling Div Fast uint2 kernel... Compiling Div uint3 kernel... Compiling Div Fast uint3 kernel... Compiling Div uint4 kernel... Compiling Div Fast uint4 kernel... Compute_IntMod: Online compilation test with uint using 65536 instances, each with 400 iterations Running Div kernel for uint... Running Div kernel Fast for uint... Compute_IntMod: Online compilation test with uint2 using 65536 instances, each with 400 iterations Running Div kernel for uint2... Running Div kernel Fast for uint2... Compute_IntMod: Online compilation test with uint3 using 65536 instances, each with 400 iterations Running Div kernel for uint3... Running Div kernel Fast for uint3... Compute_IntMod: Online compilation test with uint4 using 65536 instances, each with 400 iterations Running Div kernel for uint4... Running Div kernel Fast for uint4... --- uint Div: --- Verify_IntMod: Time                     0.206417s Verify_IntMod: Iterations               400 Verify_IntMod: Instances                65536 Verify_IntMod: Integer operations 1520697344 Verify_IntMod: GIOp                    1.520697 Verify_IntMod: GIOP/S                  7.367113 --- uint Div fast: --- Verify_IntMod: Time                     0.206099s Verify_IntMod: Iterations               400 Verify_IntMod: Instances                65536 Verify_IntMod: Integer operations 1520697344 Verify_IntMod: GIOp                    1.520697 Verify_IntMod: GIOP/S                  7.378480

--- uint2 Div: --- Verify_IntMod: Time                     0.462457s Verify_IntMod: Iterations               400 Verify_IntMod: Instances                65536 Verify_IntMod: Integer operations 3041394688 Verify_IntMod: GIOp                    3.041395 Verify_IntMod: GIOP/S                  6.576600 --- uint2 Div fast: --- Verify_IntMod: Time                     0.458995s Verify_IntMod: Iterations               400 Verify_IntMod: Instances                65536 Verify_IntMod: Integer operations 3041394688 Verify_IntMod: GIOp                    3.041395 Verify_IntMod: GIOP/S                  6.626204

--- uint3 Div: --- Verify_IntMod: Time                     0.692772s Verify_IntMod: Iterations               400 Verify_IntMod: Instances                65536 Verify_IntMod: Integer operations 4562092032 Verify_IntMod: GIOp                    4.562092 Verify_IntMod: GIOP/S                  6.585272 --- uint3 Div fast: --- Verify_IntMod: Time                     0.690712s Verify_IntMod: Iterations               400 Verify_IntMod: Instances                65536 Verify_IntMod: Integer operations 4562092032 Verify_IntMod: GIOp                    4.562092 Verify_IntMod: GIOP/S                  6.604912

--- uint4 Div: --- Verify_IntMod: Time                     0.938431s Verify_IntMod: Iterations               400 Verify_IntMod: Instances                65536 Verify_IntMod: Integer operations 6082789376 Verify_IntMod: GIOp                    6.082789 Verify_IntMod: GIOP/S                  6.481872 --- uint4 Div fast: --- Verify_IntMod: Time                     0.949093s Verify_IntMod: Iterations               400 Verify_IntMod: Instances                65536 Verify_IntMod: Integer operations 6082789376 Verify_IntMod: GIOp                    6.082789 Verify_IntMod: GIOP/S                  6.409055

intmod --> passed intmod: Test took 24.13 seconds to run: intmod: Init 16.80s (69.61%) Compute 4.97s (20.58%)Verify 2.37s (9.81)% 20****************************************************************************** Short Operations Kernels: Performs an online compilation of short add/mul/mad/div kernels which each perform a large number of one specific operation, and calculates the operations per second of the device. This test uses short{1,2,3,4} as its unit of computation. Compiling Add short kernel... Compiling Mul short kernel... Compiling Mad short kernel... Compiling Div short kernel... Compiling Add short2 kernel... Compiling Mul short2 kernel... Compiling Mad short2 kernel... Compiling Div short2 kernel... Compiling Add short3 kernel... Compiling Mul short3 kernel... Compiling Mad short3 kernel... Compiling Div short3 kernel... Compiling Add short4 kernel... Compiling Mul short4 kernel... Compiling Mad short4 kernel... Compiling Div short4 kernel... Compute_Short: Online compilation test with short using 65536 instances, each with 400 iterations Running Add kernel for short... Running Mul kernel for short... Running Mad kernel for short... Running Div kernel for short... Compute_Short: Online compilation test with short2 using 65536 instances, each with 400 iterations Running Add kernel for short2... Running Mul kernel for short2... Running Mad kernel for short2... Running Div kernel for short2... Compute_Short: Online compilation test with short3 using 65536 instances, each with 400 iterations Running Add kernel for short3... Running Mul kernel for short3... Running Mad kernel for short3... Running Div kernel for short3... Compute_Short: Online compilation test with short4 using 65536 instances, each with 400 iterations Running Add kernel for short4... Running Mul kernel for short4... Running Mad kernel for short4... Running Div kernel for short4... --- short Add: --- Verify_Short: Time                     0.048165s Verify_Short: Iterations               400 Verify_Short: Instances                65536 Verify_Short: Short operations 996540416 Verify_Short: GIOp                    0.996540 Verify_Short: GIOP/S                  20.690136 --- short Mul: --- Verify_Short: Time                     0.044276s Verify_Short: Iterations               400 Verify_Short: Instances                65536 Verify_Short: Short operations 996409344 Verify_Short: GIOp                    0.996409 Verify_Short: GIOP/S                  22.504502 --- short Mad: --- Verify_Short: Time                     0.056554s Verify_Short: Iterations               400 Verify_Short: Instances                65536 Verify_Short: Short operations 2097414144 Verify_Short: GIOp                    2.097414 Verify_Short: GIOP/S                  37.086928 --- short Div: --- Verify_Short: Time                     0.261235s Verify_Short: Iterations               400 Verify_Short: Instances                65536 Verify_Short: Short operations 996409344 Verify_Short: GIOp                    0.996409 Verify_Short: GIOP/S                  3.814226

--- short2 Add: --- Verify_Short: Time                     0.095012s Verify_Short: Iterations               400 Verify_Short: Instances                65536 Verify_Short: Short operations 1993080832 Verify_Short: GIOp                    1.993081 Verify_Short: GIOP/S                  20.977148 --- short2 Mul: --- Verify_Short: Time                     0.086786s Verify_Short: Iterations               400 Verify_Short: Instances                65536 Verify_Short: Short operations 1992818688 Verify_Short: GIOp                    1.992819 Verify_Short: GIOP/S                  22.962444 --- short2 Mad: --- Verify_Short: Time                     0.111290s Verify_Short: Iterations               400 Verify_Short: Instances                65536 Verify_Short: Short operations 4194828288 Verify_Short: GIOp                    4.194828 Verify_Short: GIOP/S                  37.692769 --- short2 Div: --- Verify_Short: Time                     0.418234s Verify_Short: Iterations               400 Verify_Short: Instances                65536 Verify_Short: Short operations 1992818688 Verify_Short: GIOp                    1.992819 Verify_Short: GIOP/S                  4.764841

--- short3 Add: --- Verify_Short: Time                     0.141091s Verify_Short: Iterations               400 Verify_Short: Instances                65536 Verify_Short: Short operations 2989621248 Verify_Short: GIOp                    2.989621 Verify_Short: GIOP/S                  21.189312 --- short3 Mul: --- Verify_Short: Time                     0.129238s Verify_Short: Iterations               400 Verify_Short: Instances                65536 Verify_Short: Short operations 2989228032 Verify_Short: GIOp                    2.989228 Verify_Short: GIOP/S                  23.129637 --- short3 Mad: --- Verify_Short: Time                     0.165615s Verify_Short: Iterations               400 Verify_Short: Instances                65536 Verify_Short: Short operations 6292242432 Verify_Short: GIOp                    6.292242 Verify_Short: GIOP/S                  37.993192 --- short3 Div: --- Verify_Short: Time                     0.626321s Verify_Short: Iterations               400 Verify_Short: Instances                65536 Verify_Short: Short operations 2989228032 Verify_Short: GIOp                    2.989228 Verify_Short: GIOP/S                  4.772677

--- short4 Add: --- Verify_Short: Time                     0.188247s Verify_Short: Iterations               400 Verify_Short: Instances                65536 Verify_Short: Short operations 3986161664 Verify_Short: GIOp                    3.986162 Verify_Short: GIOP/S                  21.175167 --- short4 Mul: --- Verify_Short: Time                     0.171203s Verify_Short: Iterations               400 Verify_Short: Instances                65536 Verify_Short: Short operations 3985637376 Verify_Short: GIOp                    3.985637 Verify_Short: GIOP/S                  23.280184 --- short4 Mad: --- Verify_Short: Time                     0.220319s Verify_Short: Iterations               400 Verify_Short: Instances                65536 Verify_Short: Short operations 8389656576 Verify_Short: GIOp                    8.389657 Verify_Short: GIOP/S                  38.079587 --- short4 Div: --- Verify_Short: Time                     0.834570s Verify_Short: Iterations               400 Verify_Short: Instances                65536 Verify_Short: Short operations 3985637376 Verify_Short: GIOp                    3.985637 Verify_Short: GIOP/S                  4.775678

short --> passed short: Test took 19.14 seconds to run: short: Init 9.86s (51.52%) Compute 4.28s (22.38%)Verify 5.00s (26.11)% 21****************************************************************************** Short Operations Kernels: Performs an online compilation of short modulo kernels which each perform a large number of one specific operation, and calculates the operations per second of the device. This test uses short{1,2,3,4} as its unit of computation. Compiling Div ushort kernel... Compiling Div Fast ushort kernel... Compiling Div ushort2 kernel... Compiling Div Fast ushort2 kernel... Compiling Div ushort3 kernel... Compiling Div Fast ushort3 kernel... Compiling Div ushort4 kernel... Compiling Div Fast ushort4 kernel... Compute_ShortMod: Online compilation test with ushort using 65536 instances, each with 400 iterations Running Div kernel for ushort... Running Div kernel Fast for ushort... Compute_ShortMod: Online compilation test with ushort2 using 65536 instances, each with 400 iterations Running Div kernel for ushort2... Running Div kernel Fast for ushort2... Compute_ShortMod: Online compilation test with ushort3 using 65536 instances, each with 400 iterations Running Div kernel for ushort3... Running Div kernel Fast for ushort3... Compute_ShortMod: Online compilation test with ushort4 using 65536 instances, each with 400 iterations Running Div kernel for ushort4... Running Div kernel Fast for ushort4... --- ushort Div: --- Verify_ShortMod: Time                     0.482448s Verify_ShortMod: Iterations               400 Verify_ShortMod: Instances                65536 Verify_ShortMod: Integer operations 1520697344 Verify_ShortMod: GIOp                    1.520697 Verify_ShortMod: GIOP/S                  3.152044 --- ushort Div fast: --- Verify_ShortMod: Time                     0.242209s Verify_ShortMod: Iterations               400 Verify_ShortMod: Instances                65536 Verify_ShortMod: Integer operations 1520697344 Verify_ShortMod: GIOp                    1.520697 Verify_ShortMod: GIOP/S                  6.278451

--- ushort2 Div: --- Verify_ShortMod: Time                     0.936468s Verify_ShortMod: Iterations               400 Verify_ShortMod: Instances                65536 Verify_ShortMod: Integer operations 3041394688 Verify_ShortMod: GIOp                    3.041395 Verify_ShortMod: GIOP/S                  3.247729 --- ushort2 Div fast: --- Verify_ShortMod: Time                     0.481168s Verify_ShortMod: Iterations               400 Verify_ShortMod: Instances                65536 Verify_ShortMod: Integer operations 3041394688 Verify_ShortMod: GIOp                    3.041395 Verify_ShortMod: GIOP/S                  6.320858

--- ushort3 Div: --- Verify_ShortMod: Time                     1.518535s Verify_ShortMod: Iterations               400 Verify_ShortMod: Instances                65536 Verify_ShortMod: Integer operations 4562092032 Verify_ShortMod: GIOp                    4.562092 Verify_ShortMod: GIOP/S                  3.004272 --- ushort3 Div fast: --- Verify_ShortMod: Time                     0.731963s Verify_ShortMod: Iterations               400 Verify_ShortMod: Instances                65536 Verify_ShortMod: Integer operations 4562092032 Verify_ShortMod: GIOp                    4.562092 Verify_ShortMod: GIOP/S                  6.232681

--- ushort4 Div: --- Verify_ShortMod: Time                     2.163825s Verify_ShortMod: Iterations               400 Verify_ShortMod: Instances                65536 Verify_ShortMod: Integer operations 6082789376 Verify_ShortMod: GIOp                    6.082789 Verify_ShortMod: GIOP/S                  2.811128 --- ushort4 Div fast: --- Verify_ShortMod: Time                     1.011028s Verify_ShortMod: Iterations               400 Verify_ShortMod: Instances                65536 Verify_ShortMod: Integer operations 6082789376 Verify_ShortMod: GIOp                    6.082789 Verify_ShortMod: GIOP/S                  6.016440

shortmod --> passed shortmod: Test took 26.41 seconds to run: shortmod: Init 15.94s (60.36%) Compute 7.95s (30.11%)Verify 2.52s (9.53)% 22****************************************************************************** Char Operations Kernels: Performs an online compilation of char add/mul/mad/div kernels which each perform a large number of one specific operation, and calculates the operations per second of the device. This test uses char{1,2,3,4} as its unit of computation. Compiling Add char kernel... Compiling Mul char kernel... Compiling Mad char kernel... Compiling Div char kernel... Compiling Add char2 kernel... Compiling Mul char2 kernel... Compiling Mad char2 kernel... Compiling Div char2 kernel... Compiling Add char3 kernel... Compiling Mul char3 kernel... Compiling Mad char3 kernel... Compiling Div char3 kernel... Compiling Add char4 kernel... Compiling Mul char4 kernel... Compiling Mad char4 kernel... Compiling Div char4 kernel... Compute_Char: Online compilation test with char using 65536 instances, each with 400 iterations Running Add kernel for char... Running Mul kernel for char... Running Mad kernel for char... Running Div kernel for char... Compute_Char: Online compilation test with char2 using 65536 instances, each with 400 iterations Running Add kernel for char2... Running Mul kernel for char2... Running Mad kernel for char2... Running Div kernel for char2... Compute_Char: Online compilation test with char3 using 65536 instances, each with 400 iterations Running Add kernel for char3... Running Mul kernel for char3... Running Mad kernel for char3... Running Div kernel for char3... Compute_Char: Online compilation test with char4 using 65536 instances, each with 400 iterations Running Add kernel for char4... Running Mul kernel for char4... Running Mad kernel for char4... Running Div kernel for char4... --- char Add: --- Verify_Char: Time                     0.048132s Verify_Char: Iterations               400 Verify_Char: Instances                65536 Verify_Char: Char operations 996540416 Verify_Char: GIOp                    0.996540 Verify_Char: GIOP/S                  20.704322 --- char Mul: --- Verify_Char: Time                     0.031758s Verify_Char: Iterations               400 Verify_Char: Instances                65536 Verify_Char: Char operations 681836544 Verify_Char: GIOp                    0.681837 Verify_Char: GIOP/S                  21.469757 --- char Mad: --- Verify_Char: Time                     0.056299s Verify_Char: Iterations               400 Verify_Char: Instances                65536 Verify_Char: Char operations 2097414144 Verify_Char: GIOp                    2.097414 Verify_Char: GIOP/S                  37.254909 --- char Div: --- Verify_Char: Time                     0.261175s Verify_Char: Iterations               400 Verify_Char: Instances                65536 Verify_Char: Char operations 996409344 Verify_Char: GIOp                    0.996409 Verify_Char: GIOP/S                  3.815102

--- char2 Add: --- Verify_Char: Time                     0.095022s Verify_Char: Iterations               400 Verify_Char: Instances                65536 Verify_Char: Char operations 1993080832 Verify_Char: GIOp                    1.993081 Verify_Char: GIOP/S                  20.974941 --- char2 Mul: --- Verify_Char: Time                     0.061869s Verify_Char: Iterations               400 Verify_Char: Instances                65536 Verify_Char: Char operations 1363673088 Verify_Char: GIOp                    1.363673 Verify_Char: GIOP/S                  22.041298 --- char2 Mad: --- Verify_Char: Time                     0.110911s Verify_Char: Iterations               400 Verify_Char: Instances                65536 Verify_Char: Char operations 4194828288 Verify_Char: GIOp                    4.194828 Verify_Char: GIOP/S                  37.821571 --- char2 Div: --- Verify_Char: Time                     0.519456s Verify_Char: Iterations               400 Verify_Char: Instances                65536 Verify_Char: Char operations 1992818688 Verify_Char: GIOp                    1.992819 Verify_Char: GIOP/S                  3.836357

--- char3 Add: --- Verify_Char: Time                     0.141104s Verify_Char: Iterations               400 Verify_Char: Instances                65536 Verify_Char: Char operations 2989621248 Verify_Char: GIOp                    2.989621 Verify_Char: GIOP/S                  21.187360 --- char3 Mul: --- Verify_Char: Time                     0.091806s Verify_Char: Iterations               400 Verify_Char: Instances                65536 Verify_Char: Char operations 2045509632 Verify_Char: GIOp                    2.045510 Verify_Char: GIOP/S                  22.280784 --- char3 Mad: --- Verify_Char: Time                     0.165578s Verify_Char: Iterations               400 Verify_Char: Instances                65536 Verify_Char: Char operations 6292242432 Verify_Char: GIOp                    6.292242 Verify_Char: GIOP/S                  38.001682 --- char3 Div: --- Verify_Char: Time                     0.627681s Verify_Char: Iterations               400 Verify_Char: Instances                65536 Verify_Char: Char operations 2989228032 Verify_Char: GIOp                    2.989228 Verify_Char: GIOP/S                  4.762336

--- char4 Add: --- Verify_Char: Time                     0.188343s Verify_Char: Iterations               400 Verify_Char: Instances                65536 Verify_Char: Char operations 3986161664 Verify_Char: GIOp                    3.986162 Verify_Char: GIOP/S                  21.164374 --- char4 Mul: --- Verify_Char: Time                     0.121852s Verify_Char: Iterations               400 Verify_Char: Instances                65536 Verify_Char: Char operations 2727346176 Verify_Char: GIOp                    2.727346 Verify_Char: GIOP/S                  22.382449 --- char4 Mad: --- Verify_Char: Time                     0.220987s Verify_Char: Iterations               400 Verify_Char: Instances                65536 Verify_Char: Char operations 8389656576 Verify_Char: GIOp                    8.389657 Verify_Char: GIOP/S                  37.964480 --- char4 Div: --- Verify_Char: Time                     0.835537s Verify_Char: Iterations               400 Verify_Char: Instances                65536 Verify_Char: Char operations 3985637376 Verify_Char: GIOp                    3.985637 Verify_Char: GIOP/S                  4.770151

char --> passed char: Test took 19.13 seconds to run: char: Init 9.78s (51.11%) Compute 4.30s (22.45%)Verify 5.06s (26.43)% 23****************************************************************************** Device Transfer Test: Performs purely transfer operations using an OpenCL buffer from: host  --> device (Upload) device --> device (Copy) device --> host  (Download) It reports the average speed of each operation on the device. Init_Transfer: Random seed set to 1507279508 Compute_Transfer: Starting transfer operations ... , 4MB, 4MB, 4MB, 4MB, 5MB, 5MB, 6MB, 7MB, 8MB, 9MB, 10MB, 12MB, 16MB, 21MB, 32MB, 64MB Compute_Transfer: Transfer tests complete. Actual speed Verify_Transfer: Transfer Type | Size (MB) | Offset | Time (s) | MB/s Verify_Transfer: Upload       |         4 | 02c0a3d | 0.004000 | 1000.00 Verify_Transfer: Copy         |         4 | 01b1eb8 | 0.004000 | 1000.00 Verify_Transfer: Download     |         4 | 01b4ccc | 0.031000 | 129.03 Verify_Transfer: Upload       |         4 | 0273fb0 | 0.004000 | 1000.00 Verify_Transfer: Copy         |         4 | 01154a4 | 0.004000 | 1000.00 Verify_Transfer: Download     |         4 | 03502e7 | 0.035000 | 114.29 Verify_Transfer: Upload       |         4 | 02d2a6c | 0.005000 | 800.00 Verify_Transfer: Copy         |         4 | 026fc32 | 0.004000 | 1000.00 Verify_Transfer: Download     |         4 | 011ba08 | 0.036000 | 111.11 Verify_Transfer: Upload       |         4 | 03e5001 | 0.005000 | 800.00 Verify_Transfer: Copy         |         4 | 03e3d64 | 0.004000 | 1000.00 Verify_Transfer: Download     |         4 | 01eaf03 | 0.038000 | 105.26 Verify_Transfer: Upload       |         5 | 00d6041 | 0.005000 | 1000.00 Verify_Transfer: Copy         |         5 | 03b22d0 | 0.005000 | 1000.00 Verify_Transfer: Download     |         5 | 015fafe | 0.044000 | 113.64 Verify_Transfer: Upload       |         5 | 03f7897 | 0.006000 | 833.33 Verify_Transfer: Copy         |         5 | 01b2a66 | 0.005000 | 1000.00 Verify_Transfer: Download     |         5 | 00f7ca8 | 0.048000 | 104.17 Verify_Transfer: Upload       |         6 | 03f7f8c | 0.006000 | 1000.00 Verify_Transfer: Copy         |         6 | 04ba9d1 | 0.005000 | 1200.00 Verify_Transfer: Download     |         6 | 0368f08 | 0.050000 | 120.00 Verify_Transfer: Upload       |         7 | 022fe34 | 0.007000 | 1000.00 Verify_Transfer: Copy         |         7 | 01dd5c6 | 0.006000 | 1166.67 Verify_Transfer: Download     |         7 | 03f01f2 | 0.056000 | 125.00 Verify_Transfer: Upload       |         8 | 03acccc | 0.008000 | 1000.00 Verify_Transfer: Copy         |         8 | 018a3d7 | 0.007000 | 1142.86 Verify_Transfer: Download     |         8 | 01a6e97 | 0.062000 | 129.03 Verify_Transfer: Upload       |         9 | 015f15f | 0.008000 | 1125.00 Verify_Transfer: Copy         |         9 | 069d3c7 | 0.008000 | 1125.00 Verify_Transfer: Download     |         9 | 02f6585 | 0.072000 | 125.00 Verify_Transfer: Upload       |        10 | 06f5c29 | 0.010000 | 1000.00 Verify_Transfer: Copy         |        10 | 08468ad | 0.009000 | 1111.11 Verify_Transfer: Download     |        10 | 019e26a | 0.086000 | 116.28 Verify_Transfer: Upload       |        12 | 026fe71 | 0.011000 | 1090.91 Verify_Transfer: Copy         |        12 | 0599ed7 | 0.011000 | 1090.91 Verify_Transfer: Download     |        12 | 04a6223 | 0.109000 | 110.09 Verify_Transfer: Upload       |        16 | 0510625 | 0.014000 | 1142.86 Verify_Transfer: Copy         |        16 | 028b439 | 0.013000 | 1230.77 Verify_Transfer: Download     |        16 | 048f5c2 | 0.129000 | 124.03 Verify_Transfer: Upload       |        21 | 09ba5e3 | 0.019000 | 1105.26 Verify_Transfer: Copy         |        21 | 00fb38a | 0.017000 | 1235.29 Verify_Transfer: Download     |        21 | 08607bc | 0.172000 | 122.09 Verify_Transfer: Upload       |        32 | 0753f7d | 0.029000 | 1103.45 Verify_Transfer: Copy         |        32 | 097ced9 | 0.026000 | 1230.77 Verify_Transfer: Download     |        32 | 05851eb | 0.257000 | 124.51 Verify_Transfer: Upload       |        64 | 0000000 | 0.057000 | 1122.81 Verify_Transfer: Copy         |        64 | 0000000 | 0.049000 | 1306.12 Verify_Transfer: Download     |        64 | 0000000 | 0.314000 | 203.82 Verify_Transfer: Average upload speed 1007.73 MB/s Verify_Transfer: Average copy speed 1114.97 MB/s Verify_Transfer: Average download speed 123.58 MB/s

Verify_Transfer: Verifying data integrity Verify_Transfer: Data integrity OK transfer --> passed transfer: Test took 9.31 seconds to run: transfer: Init 4.12s (44.33%) Compute 2.59s (27.87%)Verify 2.59s (27.80)% 24****************************************************************************** Work-group Memory Copy Kernel: Performs an online compilation of a kernel which copies input to output using work-groups, verifying the results.

Verify_MemcpyWorkgroup: Instances:      6528, Copied   0.024902 MBs, Time   0.000325s  76.622596MB/s Verify_MemcpyWorkgroup: Instances:    691200, Copied   2.636719 MBs, Time   0.004165s 633.065726MB/s Verify_MemcpyWorkgroup: Instances:     31360, Copied   0.119629 MBs, Time   0.000377s 317.318054MB/s Verify_MemcpyWorkgroup: Instances:   1344000, Copied   5.126953 MBs, Time   0.007608s 673.889738MB/s memcpy_workgroups --> passed memcpy_workgroups: Test took 0.97 seconds to run: memcpy_workgroups: Init 0.14s (14.70%) Verify 0.82s (85.30)% 25****************************************************************************** Global-offset Memory Copy Kernel: Performs an online compilation of a kernel which copies input to output using global offsets, verifying the results.

memcpy_global_offsets --> passed memcpy_global_offsets: Test took 0.39 seconds to run: memcpy_global_offsets: Init 0.15s (38.68%) Verify 0.24s (61.32)% 26****************************************************************************** Image Convolution Test: Runs a number of image convolution kernels on an image

Failed to open source.bmp for reading Running image copy, frame 0 Running Sharpen kernel, frame 1 Running Blur kernel, frame 2 Running Gaussian kernel, frame 3 Running Laplace Edge detection kernel, frame 4 convolution --> passed convolution: Test took 2.70 seconds to run: convolution: Init 1.96s (72.52%) Compute 0.29s (10.81%)Verify 0.45s (16.67)% 27****************************************************************************** Conversions Test: Tests conversions between various data types and verifies the results.

conversions --> passed conversions: Test took 20.44 seconds to run: conversions: Init 0.00s (0.04%) Verify 20.43s (99.95)% 28****************************************************************************** Events Test: Runs a mixture of kernels using events and not using events all of which perform a mem copy and verify the result

Verify_MulEvtKernel: Instances:        32, Copied    0 MBs, Time   0.000298s   0.000000MB/s Verify_MulEvtKernel: Instances:        32, Copied    0 MBs, Time   0.000160s   0.000000MB/s Verify_MulEvtKernel: Instances:        32, Copied    0 MBs, Time   0.000150s   0.000000MB/s Verify_MulEvtKernel: Instances:        32, Copied    0 MBs, Time   0.000151s   0.000000MB/s Verify_MulEvtKernel: Instances:        32, Copied    0 MBs, Time   0.000166s   0.000000MB/s Verify_MulEvtKernel: Instances:        32, Copied    0 MBs, Time   0.000145s   0.000000MB/s Verify_MulEvtKernel: Instances:        32, Copied    0 MBs, Time   0.000130s   0.000000MB/s Verify_MulEvtKernel: Instances:        32, Copied    0 MBs, Time   0.000140s   0.000000MB/s Verify_MulEvtKernel: Instances:        32, Copied    0 MBs, Time   0.000143s   0.000000MB/s Verify_MulEvtKernel: Instances:        32, Copied    0 MBs, Time   0.000153s   0.000000MB/s Verify_MulEvtKernel: Instances:        32, Copied    0 MBs, Time   0.000144s   0.000000MB/s Verify_MulEvtKernel: Instances:        32, Copied    0 MBs, Time   0.000175s   0.000000MB/s Verify_MulEvtKernel: Instances:        32, Copied    0 MBs, Time   0.000156s   0.000000MB/s Verify_MulEvtKernel: Instances:        32, Copied    0 MBs, Time   0.000148s   0.000000MB/s events --> passed events: Test took 0.78 seconds to run: events: Init 0.01s (2.42%) Verify 0.76s (97.57)% 29****************************************************************************** Atomics Test: Runs all atomic functions

atomics --> passed atomics: Test took 1.57 seconds to run: atomics: Init 0.00s (0.56%) Verify 1.57s (99.43)% 30****************************************************************************** NOP Test: Runs a kernel with no body effectively making it a NOP kernel

Compute_NOP: Online compilation test with 4096 instances running source: nop --> passed nop: Test took 0.39 seconds to run: nop: Init 0.10s (25.76%) Compute 0.04s (10.45%)Verify 0.25s (63.77)% Finished 31 tests in 291.6 seconds: 31 passed, 0 failed (100.00%)