Difference between revisions of "User:Uli/R-Car M3-W Mainline GPU Test"

From eLinux.org
Jump to: navigation, search
m (Alternative binary blobs)
 
Line 145: Line 145:
  
 
= Alternative binary blobs =
 
= Alternative binary blobs =
The firmware required for the particular GPU version in the M3-W SoC (<tt>rgx.fw.4.45.2.58</tt>) is included in the file <tt>R-Car_Gen3_Series_Evaluation_Software_Package_for_Linux-20170828.zip</tt>, which also contains the original driver sources and a set of binary-blob libraries. One would be inclined to assume that these components are more compatible with one another, but any tests run with them fail in the same way as with the Yocto libraries.
+
The firmware required for the particular GPU version in the M3-W SoC (<tt>rgx.fw.4.45.2.58</tt>) is included in the file <tt>R-Car_Gen3_Series_Evaluation_Software_Package_for_Linux-20170828.zip</tt>, which also contains the original driver sources and a different set of binary-blob libraries. One would be inclined to assume that these components are more compatible with one another, but any tests run with them fail in the same way as with the Yocto libraries.
  
 
= Passing tests =
 
= Passing tests =

Latest revision as of 03:30, 15 December 2017

Results of GPU test programs run on the Renesas R-Car M3-W Salvator-X board.

Creating the test setup

Building the user space (root) file system

This builds an ARM64 Debian userland on a Debian-derived system to be used as an NFS root file system:

  1. Run sudo qemu-debootstrap --arch arm64 sid /tmp/nfs_root http://deb.debian.org/debian/.
  2. Edit /tmp/nfs_root/etc/shadow and replace root:*:... with root::....
  3. Edit /tmp/nfs_root/etc/resolv.conf and enter a working nameserver address.
  4. Extract all the nested archives in R-Car_Gen3_Series_Evaluation_Software_Package_for_Linux-20170828.zip, then:
    rsync -av R-Car_Gen3_Series_Evaluation_Software_Package_for_Linux/m3_gfxeva_packages/EVARTM0RC7796GLTG0001SL40C_1_3_4/EVARTM0RC7796GLTG0001SL40C/Software/rogue/ /tmp/nfs_root/
  5. Build the GPU driver module:
    cd build/linux/r8a7796
    ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- KERNELDIR=<path to kernel source> make BUILD=release
    Then copy the driver module pvrsrvkm.ko to your userland file system.
  6. Copy the Yocto binary-only libraries to the root file system:
    sudo cp -a ./Yocto223_OCL_20171129/opencl-ddk/12-hour/rogue /tmp/nfs_root/root/yocto

Boot the NFS root file system and continue on the target platform:

  1. Run apt-get update.
  2. Install a number of packages:
    apt-get install build-essential libdrm-dev openssh-server \
            libgbm-dev git openssh-server autotools-dev autoconf automake \
            pkg-config libtool-bin libwayland-bin libwayland-dev make \
            libudev-dev libelf-dev libunwind-dev
    
  3. If you want to log in remotely via ssh:
    1. Set a root password.
    2. edit /etc/ssh/sshd_config and change #PermitRootLogin prohibit-password to PermitRootLogin yes.
    3. Run /etc/init.d/ssh restart.
  4. Build wayland-kms:
    git clone https://github.com/renesas-rcar/wayland-kms.git
    cd wayland-kms
    autoreconf -i
    ./configure
    make
    make install
    ldconfig
  5. Create a number of missing symlinks:
    cd /root/yocto/usr/lib
    ln -s libEGL.so libEGL.so.1
    ln -s libGLES_CM.so libGLES_CM.so.1
    ln -s libGLESv2.so libGLESv2.so.1
    

Running tests

Run the tests provided with the Yocto binary-only libraries like so:

LD_LIBRARY_PATH=/root/yocto/usr/lib /root/yocto/usr/local/bin/<test program>

Building ltrace

ltrace is, surprisingly, not a standard Debian package. To build it locally, run:

git clone https://github.com/dkogan/ltrace.git
cd ltrace
./autogen.sh
./configure --disable-werror
make
make install

Failing tests

In general, all tests using EGL fail early on, with the eglInitialize() function returning error 0x3001 (EGL_NOT_INITIALIZED).

The reason for this is unclear, as there is no documented facility that would allow introspection into the workings of the binary-only libraries shipped as part of the Yocto GPU support package linked above.

Given that all tests that do not make use of either EGL or libsutu_display.so pass, I am inclined to think it is a user space incompatibility between the Debian user space and the Yocto libraries, rather than a driver problem.

(Not all failing tests are documented here as the failure mode is identical in all cases.)

gles2test1

--------------------- started ---------------------
'eglInitialize' returned egl error 'EGL_NOT_INITIALIZED' (0x3001)

ltrace log of the test run:

[pid 747] __libc_start_main([ "/root/yocto/usr/local/bin/gles2t"... ] <unfinished ...>
[pid 747] puts("--------------------- started --"...) = 52
[pid 747] eglGetDisplay(0, 0, 1, 0)              = 1
[pid 747] eglInitialize(1, 0xffffdddf1574, 0xffffdddf1578, 0) = 0
[pid 747] eglGetError(0xaaaab25450a0, 0, 0xffffbb4afbd0, 0) = 0x3001
[pid 747] printf("'%s' returned egl error '%s' (0x"..., "eglInitialize", "EGL_NOT_INITIALIZED", 0x3001) = 66
[pid 747] exit(1 <unfinished ...>
[pid 747] __cxa_finalize(0xaaaab2555828, 0xaaaab2543f60, 0xaaaab2542000, 1) = 0
--------------------- started ---------------------
'eglInitialize' returned egl error 'EGL_NOT_INITIALIZED' (0x3001)
[pid 747] +++ exited (status 1) +++

It has been verified via ldd that the correct (i.e. binary-only) libraries are being used in the test run.

rgx_blit_test

This is an example of a failing test using libsutu_display.so.

------------------- rgx blit test -------------------
---------------------- Start ------------------------
Using display type: DRM/KMS
Call PVRSRVConnect with a valid argument:
 OK
Attempt to create device memory context:
 OK
Creating synchronization context:
 OK
(unittests/services/common/sutu_display/sutu_drm.c:1213) Invalid pointer (psDisplay->psOutput == (nil))

ltrace log:

[pid 741] __libc_start_main([ "/root/yocto/usr/local/bin/rgx_bl"... ] <unfinished ...>
[pid 741] puts("------------------- rgx blit tes"...) = 54
[pid 741] puts("---------------------- Start ---"...) = 54
[pid 741] sutu_DisplayGetTypeName(54, 0, 0x8b160887277a7100, 0) = 0xffffa604bb68
[pid 741] printf("Using display type: %s\n", "DRM/KMS") = 28
[pid 741] PVRSRVConnect(0xfffffe88c650, 0, 0x8b160887277a7100, 0xffffa6256b20) = 0
[pid 741] puts(" OK")                            = 4
[pid 741] PVRSRVCreateDeviceMemContext(0xaaab0ace4020, 0xfffffe88c658, 0x8b160887277a7100, 0xffffa6256b20) = 0
[pid 741] puts(" OK")                            = 4
[pid 741] PVRSRVSyncPrimContextCreate(0xaaab0ace4020, 0xfffffe88c670, 0x8b160887277a7100, 0xffffa6256b20) = 0
[pid 741] puts(" OK")                            = 4
[pid 741] sutu_DisplayCreateContext(0xaaab0ace4020, 0xaaab0ace4180, 0xfffffe88c620, 0 <no return ...>
[pid 741] --- SIGABRT (Aborted) ---
[pid 741] +++ killed by SIGABRT +++

Alternative binary blobs

The firmware required for the particular GPU version in the M3-W SoC (rgx.fw.4.45.2.58) is included in the file R-Car_Gen3_Series_Evaluation_Software_Package_for_Linux-20170828.zip, which also contains the original driver sources and a different set of binary-blob libraries. One would be inclined to assume that these components are more compatible with one another, but any tests run with them fail in the same way as with the Yocto libraries.

Passing tests

rogue2d_unittest

Test succeeded
Writing dst.bin size 65536
end of rogue2d unit test

rgx_kicksync_test

rgx_kicksync_test test configuration:
Verbose: N
Num contexts: 2
Num syncs per context: 16
Num loops: 1
Sync first value: 0x00000001
Num syncs per command: 3
Num commands per loop: 32
Verify every command: N
Delay (ms): 0
Random: N
----------------------- Start -----------------------
----------------------- Loop 1 / 1 -----------------------
Initialising contexts
Submitting 32 commands, each with 3 syncs
All commands submitted
Verifying all syncs have their expected final value
All syncs have their expected value
Releasing contexts
Test successful
----------------------- End -----------------------

pvr_memory_test

Profiling 4MB block-size transfer (512 times, 2048MB) using 4K-aligned (12) buffer
-------------------- Start tests (+ Memcpy) --------------------

Loop count = 512

CPU(ZeroPg)         -> CPU(ZeroPg)          started: Moving 512 blocks of 4194304 bytes = 2048Mbytes per test
						took 1.168463394 seconds = 1752.7Mbytes/sec

CPU(ZeroPg)         -> CPU(Cached)          started: Moving 512 blocks of 4194304 bytes = 2048Mbytes per test
						took 1.161539246 seconds = 1763.2Mbytes/sec

CPU(Cached)         -> CPU(Cached)          started: Moving 512 blocks of 4194304 bytes = 2048Mbytes per test
						took 2.44308258 seconds = 1001.8Mbytes/sec

CPU(Cached)         -> DEV(Cached)          started: Moving 512 blocks of 4194304 bytes = 2048Mbytes per test
						took 2.48023952 seconds = 1000.0Mbytes/sec

CPU(Cached)         -> DEV(Cached) + Flush  started: Moving 512 blocks of 4194304 bytes = 2048Mbytes per test
						took 2.671264205 seconds = 766.7Mbytes/sec

DEV(Cached)         -> CPU(Cached)          started: Moving 512 blocks of 4194304 bytes = 2048Mbytes per test
						took 2.34345457 seconds = 1006.7Mbytes/sec

CPU(Cached)         -> DEV(Uncached)        started: Moving 512 blocks of 4194304 bytes = 2048Mbytes per test
						took 3.235934264 seconds = 632.9Mbytes/sec

DEV(Uncached)       -> CPU(Cached)          started: Moving 512 blocks of 4194304 bytes = 2048Mbytes per test
						took 16.159380557 seconds = 126.7Mbytes/sec

CPU(Cached)         -> DEV(Write-Combined)  started: Moving 512 blocks of 4194304 bytes = 2048Mbytes per test
						took 1.838601274 seconds = 1113.9Mbytes/sec

DEV(Write-Combined) -> CPU(Cached)          started: Moving 512 blocks of 4194304 bytes = 2048Mbytes per test
						took 10.265888423 seconds = 199.5Mbytes/sec

--------------------- End tests ---------------------

Profiling 4MB block-size transfer (16 times, 64MB) using 4K-aligned (12) buffer
-------------------- Start tests (+ Memcpy) --------------------

Loop count = 16

CPU(ZeroPg)         -> CPU(ZeroPg)          started: Moving 16 blocks of 4194304 bytes = 64Mbytes per test
						took 0.43178572 seconds = 1482.2Mbytes/sec

CPU(ZeroPg)         -> CPU(Cached)          started: Moving 16 blocks of 4194304 bytes = 64Mbytes per test
						took 0.36431785 seconds = 1756.7Mbytes/sec

CPU(Cached)         -> CPU(Cached)          started: Moving 16 blocks of 4194304 bytes = 64Mbytes per test
						took 0.64104496 seconds = 998.4Mbytes/sec

CPU(Cached)         -> DEV(Cached)          started: Moving 16 blocks of 4194304 bytes = 64Mbytes per test
						took 0.64066577 seconds = 999.0Mbytes/sec

CPU(Cached)         -> DEV(Cached) + Flush  started: Moving 16 blocks of 4194304 bytes = 64Mbytes per test
						took 0.83345853 seconds = 767.9Mbytes/sec

DEV(Cached)         -> CPU(Cached)          started: Moving 16 blocks of 4194304 bytes = 64Mbytes per test
						took 0.65629823 seconds = 975.2Mbytes/sec

CPU(Cached)         -> DEV(Uncached)        started: Moving 16 blocks of 4194304 bytes = 64Mbytes per test
						took 0.104447218 seconds = 612.7Mbytes/sec

DEV(Uncached)       -> CPU(Cached)          started: Moving 16 blocks of 4194304 bytes = 64Mbytes per test
						took 0.506740307 seconds = 126.3Mbytes/sec

CPU(Cached)         -> DEV(Write-Combined)  started: Moving 16 blocks of 4194304 bytes = 64Mbytes per test
						took 0.57340429 seconds = 1116.1Mbytes/sec

DEV(Write-Combined) -> CPU(Cached)          started: Moving 16 blocks of 4194304 bytes = 64Mbytes per test
						took 0.321706407 seconds = 198.9Mbytes/sec

--------------------- End tests ---------------------

rgx_compute_test

------------------ RGX compute test -----------------
----------------------- Start -----------------------
Call PVRSRVConnect with a valid argument:
 OK
Attempt to create device memory context:
 OK
Creating synchronization context:
 OK
Looking up General heap handle
 OK
Getting event object
 OK
Creating Compute Context
 OK
Creating Buffer
Creating DWord for CDM Event Object
 OK
 OK
Create PDS Heap
 OK
Create USC Heap
 OK
Allocate sync primitive
 OK
Creating NOP instruction
Creating Data Segment
Creating Code Segment
Write Kernel 0
Creating Fence Data Segment
Creating Code Segment
Write Fence Kernel
Write Terminate
Call services to kick CDM
 OK
Poll for sync update
 OK
Poll for CDM event object data
 OK
Destroy Compute Context
 OK

Total time: 0ms
Destroy synchronization context:
Destroy Device Memory Context
Disconnect from services:
 OK
------------------------ End ------------------------

ocl_unit_test

OpenCL Unit Test(s) (Rogue_DDK_Linux rogueddk 1.7@4563938,release,r8a7795_linux) at Fri Oct  6 08:40:47 2017
00******************************************************************************
Platform Test:
	Checks that an OpenCL compatible platform is present
	for the unit test to run.
********************************************************************************
Verify_Platform: Enumerating 1 platforms
Verify_Platform: CL_PLATFORM_PROFILE    EMBEDDED_PROFILE
Verify_Platform: CL_PLATFORM_VERSION    OpenCL 1.2 
Verify_Platform: CL_PLATFORM_NAME       PowerVR Rogue
Verify_Platform: CL_PLATFORM_VENDOR     Imagination Technologies
Verify_Platform: CL_PLATFORM_EXTENSIONS 'cl_khr_icd cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_egl_image cl_img_yuv_image cles_khr_int64 cl_img_cached_allocations cl_img_use_gralloc_ptr cl_khr_spir'
platform --> passed
platform: Test took 0.22 seconds to run:
platform: Verify 0.22s (100.00)%
01******************************************************************************
Device Test:
	Checks that an OpenCL compatible device is present
	for the unit test to run.
********************************************************************************
Verify_Device: Enumerating 1 devices
Verify_Device: CL_DEVICE_TYPE       CL_DEVICE_TYPE_GPU
Verify_Device: CL_DEVICE_NAME       PowerVR Rogue GX6650
Verify_Device: CL_DEVICE_VENDORI    Imagination Technologies
Verify_Device: CL_DRIVER_VERSION    1.7@4563938
Verify_Device: CL_DEVICE_PROFILE    EMBEDDED_PROFILE
Verify_Device: CL_DEVICE_VERSION    OpenCL 1.2 
Verify_Device: CL_DEVICE_EXTENSIONS cl_khr_icd cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_egl_image cl_img_yuv_image cles_khr_int64 cl_img_cached_allocations cl_img_use_gralloc_ptr cl_khr_spir
device --> passed
device: Test took 0.34 seconds to run:
device: Verify 0.34s (100.00)%
02******************************************************************************
Bounding Box:
	Runs the kernel that compute floating point minimum and maximum
	for large number of vertices.

********************************************************************************
  Objects:      100, # triangles per object:    15393, Computing instances per object:  512, Time - start - stop: 164.080000ms
bbox --> passed
bbox: Test took 2.87 seconds to run:
bbox: Init 2.56s (89.19%)  Verify 0.31s (10.81)%
03******************************************************************************
Addition Kernel:
	Performs an online compilation of an integer addition kernel
	and verifies that the output buffer is correct.
********************************************************************************
Compute_Add: Online compilation test with 4096 instances running source:
<source>
__kernel void AdditionKernel(__global int* a, __global int* b)
{
	int ith = get_global_id(0);
	a[ith] = a[ith] + b[ith];
}
</source>
Verify_Add: Verification OK
add --> passed
add: Test took 0.09 seconds to run:
add: Init 0.02s (21.98%)  Compute 0.02s (23.08%)Verify 0.05s (54.95)%
04******************************************************************************
Binary Check:
	Runs the same kernel as the addition test however first saves
	the binary version to the filsystem, recreates the OpenCL
	context and ensures the binary test computes the same results
	as the online test.
********************************************************************************
CL_PROGRAM_BUILD_LOG:

Init_Binary: Wrote out 1 binaries to the file system
Compute_Binary: Successfully loaded back binary_0.bin from filesystem
Verify_Binary: Binary file successfully executed kernel
binary --> passed
binary: Test took 0.18 seconds to run:
binary: Init 0.09s (46.70%)  Compute 0.04s (19.23%)Verify 0.06s (34.07)%
05******************************************************************************
Error Log Check:
	Runs an illegal kernel that contains a simple undeclared
	identifier error and verifies that the build log provided
	by the OpenCL implementation is correct.
********************************************************************************
errorlog:
*** Build Log ***
BuildGroup_1:3:2: error: use of undeclared identifier 'undeclared'
        undeclared identifier
        ^
errorlog: Verified build log contained required error.
errorlog --> passed
errorlog: Test took 0.09 seconds to run:
errorlog: Init 0.01s (10.00%)  Verify 0.08s (90.00)%
06******************************************************************************
Memory Copy Kernel:
	Performs an online compilation of a kernel which copies input
	to output, verifying the results and calculating the speed at
	which the data is transferred.
********************************************************************************
  BufferType:    uchar, Instances:   524288, Copies Per Instance:   32, Copied:   16 MBs, Time:   0.068670s, 232.998398MB/s
  BufferType:   ushort, Instances:   262144, Copies Per Instance:   32, Copied:   16 MBs, Time:   0.128205s, 124.800125MB/s
  BufferType:     uint, Instances:   131072, Copies Per Instance:   32, Copied:   16 MBs, Time:   0.026237s, 609.825819MB/s
  BufferType:    uint2, Instances:    65536, Copies Per Instance:   32, Copied:   16 MBs, Time:   0.092728s, 172.547666MB/s
  BufferType:    uint4, Instances:    32768, Copies Per Instance:   32, Copied:   16 MBs, Time:   0.023522s, 680.214267MB/s
  BufferType:    uint8, Instances:    16384, Copies Per Instance:   32, Copied:   16 MBs, Time:   0.025302s, 632.361078MB/s
  BufferType:   uint16, Instances:     8192, Copies Per Instance:   32, Copied:   16 MBs, Time:   0.015503s, 1032.058311MB/s
memcpy --> passed
memcpy: Test took 1.90 seconds to run:
memcpy: Init 0.08s (3.95%)  Verify 1.82s (96.05)%
07******************************************************************************
Memory Strided Copy Kernel:
	Performs an online compilation of a kernel which copies input
	to output, verifying the results and calculating the speed at
	which the data is transferred using strided pattern.
********************************************************************************
  BufferType:    uchar, Instances:   524288, Copies Per Instance:   32, Copied:   16 MBs, Workgroup Size: 256, Time:   0.057483s, 278.343162MB/s
  BufferType:   ushort, Instances:   262144, Copies Per Instance:   32, Copied:   16 MBs, Workgroup Size: 128, Time:   0.032100s, 498.442368MB/s
  BufferType:     uint, Instances:   131072, Copies Per Instance:   32, Copied:   16 MBs, Workgroup Size: 64, Time:   0.013142s, 1217.470705MB/s
  BufferType:    uint2, Instances:    65536, Copies Per Instance:   32, Copied:   16 MBs, Workgroup Size: 32, Time:   0.013185s, 1213.500190MB/s
  BufferType:    uint4, Instances:    32768, Copies Per Instance:   32, Copied:   16 MBs, Workgroup Size: 16, Time:   0.012212s, 1310.186702MB/s
  BufferType:    uint8, Instances:    16384, Copies Per Instance:   32, Copied:   16 MBs, Workgroup Size: 8, Time:   0.012611s, 1268.733645MB/s
  BufferType:   uint16, Instances:     8192, Copies Per Instance:   32, Copied:   16 MBs, Workgroup Size: 4, Time:   0.012611s, 1268.733645MB/s
memcpy_stride --> passed
memcpy_stride: Test took 1.42 seconds to run:
memcpy_stride: Init 0.06s (4.03%)  Verify 1.36s (95.97)%
08******************************************************************************
Memory Read Kernel:
	Performs an online compilation of a kernel which reads large
	amounts of data each instance, calculating read bandwidth.
********************************************************************************
Performing 32 memory reads per kernel instance, totalling 16777216 memory loads...
  BufferType:    uchar, Instances:   524288, Reads per instance:  128, Read:   64 MBs, Time:   0.157612s, 406.060452MB/s
Performing 32 memory reads per kernel instance, totalling 8388608 memory loads...
  BufferType:   ushort, Instances:   262144, Reads per instance:  128, Read:   64 MBs, Time:   0.635605s, 100.691467MB/s
Performing 32 memory reads per kernel instance, totalling 4194304 memory loads...
  BufferType:     uint, Instances:   131072, Reads per instance:  128, Read:   64 MBs, Time:   0.039797s, 1608.161419MB/s
Performing 32 memory reads per kernel instance, totalling 2097152 memory loads...
  BufferType:    uint2, Instances:    65536, Reads per instance:  128, Read:   64 MBs, Time:   0.037831s, 1691.734292MB/s
Performing 32 memory reads per kernel instance, totalling 1048576 memory loads...
  BufferType:    uint4, Instances:    32768, Reads per instance:  128, Read:   64 MBs, Time:   0.037738s, 1695.903334MB/s
Performing 32 memory reads per kernel instance, totalling 524288 memory loads...
  BufferType:    uint8, Instances:    16384, Reads per instance:  128, Read:   64 MBs, Time:   0.038040s, 1682.439537MB/s
Performing 32 memory reads per kernel instance, totalling 262144 memory loads...
  BufferType:   uint16, Instances:     8192, Reads per instance:  128, Read:   64 MBs, Time:   0.020294s, 3153.641470MB/s
memread --> passed
memread: Test took 2.52 seconds to run:
memread: Init 0.12s (4.64%)  Verify 2.40s (95.36)%
09******************************************************************************
Memory Strided Read Kernel:
	Performs an online compilation of a kernel which reads large
	amounts of data each instance using stride pattern, 
	calculating read bandwidth.
********************************************************************************
Performing 32 memory reads per kernel instance, totalling 16777216 memory loads...
  BufferType:    uchar, Instances:   524288, Reads per instance:  128, WG size 256, Read:   64 MBs, Time:   0.077018s, 830.974577MB/s
Performing 32 memory reads per kernel instance, totalling 8388608 memory loads...
  BufferType:   ushort, Instances:   262144, Reads per instance:  128, WG size 128, Read:   64 MBs, Time:   0.046774s, 1368.281524MB/s
Performing 32 memory reads per kernel instance, totalling 4194304 memory loads...
  BufferType:     uint, Instances:   131072, Reads per instance:  128, WG size 64, Read:   64 MBs, Time:   0.020116s, 3181.547027MB/s
Performing 32 memory reads per kernel instance, totalling 2097152 memory loads...
  BufferType:    uint2, Instances:    65536, Reads per instance:  128, WG size 32, Read:   64 MBs, Time:   0.019099s, 3350.960783MB/s
Performing 32 memory reads per kernel instance, totalling 1048576 memory loads...
  BufferType:    uint4, Instances:    32768, Reads per instance:  128, WG size 16, Read:   64 MBs, Time:   0.019511s, 3280.200912MB/s
Performing 32 memory reads per kernel instance, totalling 524288 memory loads...
  BufferType:    uint8, Instances:    16384, Reads per instance:  128, WG size 8, Read:   64 MBs, Time:   0.019315s, 3313.486927MB/s
Performing 32 memory reads per kernel instance, totalling 262144 memory loads...
  BufferType:   uint16, Instances:     8192, Reads per instance:  128, WG size 4, Read:   64 MBs, Time:   0.018088s, 3538.257408MB/s
memread_stride --> passed
memread_stride: Test took 1.39 seconds to run:
memread_stride: Init 0.12s (8.59%)  Verify 1.27s (91.41)%
10******************************************************************************
Memory Write Kernel:
	Performs an online compilation of a kernel which writes large
	amounts of data each instance, calculating write bandwidth.
********************************************************************************
  BufferType:    uchar, Instances:   524288, Writes Per Instance:  128, Written:   64 MBs, Time:   0.278133s, 230.105741MB/s
  BufferType:   ushort, Instances:   262144, Writes Per Instance:  128, Written:   64 MBs, Time:   0.994779s,  64.335898MB/s
  BufferType:     uint, Instances:   131072, Writes Per Instance:  128, Written:   64 MBs, Time:   0.749035s,  85.443270MB/s
  BufferType:    uint2, Instances:    65536, Writes Per Instance:  128, Written:   64 MBs, Time:   0.272591s, 234.783980MB/s
  BufferType:    uint4, Instances:    32768, Writes Per Instance:  128, Written:   64 MBs, Time:   0.146211s, 437.723564MB/s
  BufferType:    uint8, Instances:    16384, Writes Per Instance:  128, Written:   64 MBs, Time:   0.080468s, 795.347219MB/s
  BufferType:   uint16, Instances:     8192, Writes Per Instance:  128, Written:   64 MBs, Time:   0.045449s, 1408.171797MB/s
memwrite --> passed
memwrite: Test took 10.17 seconds to run:
memwrite: Init 0.64s (6.25%)  Verify 9.53s (93.75)%
11******************************************************************************
Memory Strided Write Kernel:
	Performs an online compilation of a kernel which writes large
	amounts of data each instance using stride pattern, 
	calculating write bandwidth.
********************************************************************************
  BufferType:    uchar, Instances:   524288, Writes Per Instance:  128, Written:   64 MBs, Time:   0.126440s, 506.168934MB/s
  BufferType:   ushort, Instances:   262144, Writes Per Instance:  128, Written:   64 MBs, Time:   0.074213s, 862.382601MB/s
  BufferType:     uint, Instances:   131072, Writes Per Instance:  128, Written:   64 MBs, Time:   0.048326s, 1324.338865MB/s
  BufferType:    uint2, Instances:    65536, Writes Per Instance:  128, Written:   64 MBs, Time:   0.030605s, 2091.161575MB/s
  BufferType:    uint4, Instances:    32768, Writes Per Instance:  128, Written:   64 MBs, Time:   0.031541s, 2029.104974MB/s
  BufferType:    uint8, Instances:    16384, Writes Per Instance:  128, Written:   64 MBs, Time:   0.030891s, 2071.800848MB/s
  BufferType:   uint16, Instances:     8192, Writes Per Instance:  128, Written:   64 MBs, Time:   0.032031s, 1998.064375MB/s
memwrite_stride --> passed
memwrite_stride: Test took 8.61 seconds to run:
memwrite_stride: Init 0.95s (10.99%)  Verify 7.67s (89.01)%
12******************************************************************************
Image Copy Kernel:
	Performs an image direct copy.
********************************************************************************
Init_ImgCopyKernel: Hardware detected, verification will be run.
ImgCpy_RunImgCpy: Instances: 128x512, Iterations:    8,  Copied   2.00 MBs, Time 0.006000s,     6 Ticks, 333.333333MB/s
ImgCpy_RunImgCpy: Instances: 128x512, Iterations:   16,  Copied   4.00 MBs, Time 0.007000s,     7 Ticks, 571.428571MB/s
ImgCpy_RunImgCpy: Instances: 128x512, Iterations:   32,  Copied   8.00 MBs, Time 0.014000s,    14 Ticks, 571.428571MB/s
ImgCpy_RunImgCpy: Instances: 128x512, Iterations:   64,  Copied  16.00 MBs, Time 0.027000s,    27 Ticks, 592.592593MB/s
ImgCpy_RunImgCpy: Instances: 128x512, Iterations:  128,  Copied  32.00 MBs, Time 0.051000s,    51 Ticks, 627.450980MB/s
ImgCpy_RunImgCpy: Instances: 128x512, Iterations:  256,  Copied  64.00 MBs, Time 0.102000s,   102 Ticks, 627.450980MB/s
ImgCpy_RunImgCpy: Instances: 128x512, Iterations:  512,  Copied 128.00 MBs, Time 0.203000s,   203 Ticks, 630.541872MB/s
ImgCpy_RunImgCpy: Instances: 128x512, Iterations: 1024,  Copied 256.00 MBs, Time 0.406000s,   406 Ticks, 630.541872MB/s
ImgCpy_RunImgCpy: Instances: 512x512, Iterations:    8,  Copied   2.00 MBs, Time 0.008000s,     8 Ticks, 250.000000MB/s
ImgCpy_RunImgCpy: Instances: 512x512, Iterations:   16,  Copied   4.00 MBs, Time 0.011000s,    11 Ticks, 363.636364MB/s
ImgCpy_RunImgCpy: Instances: 512x512, Iterations:   32,  Copied   8.00 MBs, Time 0.024000s,    24 Ticks, 333.333333MB/s
ImgCpy_RunImgCpy: Instances: 512x512, Iterations:   64,  Copied  16.00 MBs, Time 0.045000s,    45 Ticks, 355.555556MB/s
ImgCpy_RunImgCpy: Instances: 512x512, Iterations:  128,  Copied  32.00 MBs, Time 0.090000s,    90 Ticks, 355.555556MB/s
ImgCpy_RunImgCpy: Instances: 512x512, Iterations:  256,  Copied  64.00 MBs, Time 0.180000s,   180 Ticks, 355.555556MB/s
ImgCpy_RunImgCpy: Instances: 512x512, Iterations:  512,  Copied 128.00 MBs, Time 0.359000s,   359 Ticks, 356.545961MB/s
ImgCpy_RunImgCpy: Instances: 512x512, Iterations: 1024,  Copied 256.00 MBs, Time 0.717000s,   717 Ticks, 357.043236MB/s
Verify_ImgCopyKernel: Average speed 453.33 MB/s
imgcpy --> passed
imgcpy: Test took 3.10 seconds to run:
imgcpy: Init 0.16s (5.22%)  Verify 2.94s (94.78)%
13******************************************************************************
Image mipmaps:
	Performs an image direct copy with mipmaps.
********************************************************************************
Verify_MipMap: Average speed 0.00 MB/s
mipmap --> passed
mipmap: Test took 0.10 seconds to run:
mipmap: Init 0.01s (10.00%)  Verify 0.09s (90.00)%
14******************************************************************************
Floating Point Operations Kernel (Scalar):
	Performs an online compilation of floating point add/mul/mad/div
	kernels which each perform a large number of one specific operation,
	and calculates the floating point operations per second of the
	device. This test uses (scalar) float as its unit of computation.
********************************************************************************
Compiling Add kernel...
Compiling Mul kernel...
Compiling Mad kernel...
Compiling Div kernel...
Compute_Float: Online compilation test with 524288 instances, each with 300 iterations
Running Add kernel...
Running Mul kernel...
Running Mad kernel...
Running Div kernel...
---------------
Float Add:
---------------
Verify_Float: Time                      0.410148s
Verify_Float: Iterations                300
Verify_Float: Instances                 524288
Verify_Float: Floating point operations 15417212928
Verify_Float: GFlop                     15.417213
Verify_Float: GFLOP/S                   37.589390
---------------
Float Mul:
---------------
Verify_Float: Time                      0.410114s
Verify_Float: Iterations                300
Verify_Float: Instances                 524288
Verify_Float: Floating point operations 15416164352
Verify_Float: GFlop                     15.416164
Verify_Float: GFLOP/S                   37.589949
---------------
Float Mad:
---------------
Verify_Float: Time                      0.410401s
Verify_Float: Iterations                300
Verify_Float: Instances                 524288
Verify_Float: Floating point operations 30830231552
Verify_Float: GFlop                     30.830232
Verify_Float: GFLOP/S                   75.122214
---------------
Float Div:
---------------
Verify_Float: Time                      0.820315s
Verify_Float: Iterations                300
Verify_Float: Instances                 524288
Verify_Float: Floating point operations 15416164352
Verify_Float: GFlop                     15.416164
Verify_Float: GFLOP/S                   18.792981
---------------
Verify_Float: All tests time                      2.175840s
---------------
float --> passed
float: Test took 5.82 seconds to run:
float: Init 2.16s (37.04%)  Compute 2.28s (39.10%)Verify 1.39s (23.87)%
15******************************************************************************
Floating Point Operations Kernel (Scalar):
	Performs an online compilation of floating point add/mul/mad/div
	kernels which each perform a large number of one specific operation,
	and calculates the floating point operations per second of the
	device. This test uses (scalar) float as its unit of computation.
	Kernels run in parallel without synchronization on host side.
********************************************************************************
Compiling Add kernel...
Compiling Mul kernel...
Compiling Mad kernel...
Compiling Div kernel...
Compute_Float_Parallel: Online compilation test with 524288 instances, each with 300 iterations
Running Add kernel...
Running Mul kernel...
Running Mad kernel...
Running Div kernel...
---------------
Float Add:
---------------
Verify_Float: Time                      0.410296s
Verify_Float: Iterations                300
Verify_Float: Instances                 524288
Verify_Float: Floating point operations 15417212928
Verify_Float: GFlop                     15.417213
Verify_Float: GFLOP/S                   37.575830
---------------
Float Mul:
---------------
Verify_Float: Time                      0.409929s
Verify_Float: Iterations                300
Verify_Float: Instances                 524288
Verify_Float: Floating point operations 15416164352
Verify_Float: GFlop                     15.416164
Verify_Float: GFLOP/S                   37.606913
---------------
Float Mad:
---------------
Verify_Float: Time                      0.409919s
Verify_Float: Iterations                300
Verify_Float: Instances                 524288
Verify_Float: Floating point operations 30830231552
Verify_Float: GFlop                     30.830232
Verify_Float: GFLOP/S                   75.210545
---------------
Float Div:
---------------
Verify_Float: Time                      0.820163s
Verify_Float: Iterations                300
Verify_Float: Instances                 524288
Verify_Float: Floating point operations 15416164352
Verify_Float: GFlop                     15.416164
Verify_Float: GFLOP/S                   18.796464
---------------
Verify_Float: All tests time                      2.193818s
---------------
float_parallel --> passed
float_parallel: Test took 4.07 seconds to run:
float_parallel: Init 0.22s (5.48%)  Compute 2.29s (56.37%)Verify 1.55s (38.15)%
16******************************************************************************
Floating Point Operations Kernel (Scalar):
	Performs an online compilation of floating point add/mul/mad/div
	kernels which each perform a large number of one specific operation,
	and calculates the floating point operations per second of the
	device. This test uses (scalar) float as its unit of computation.
	Kernels run in order of their dependencies created by events.
********************************************************************************
Compiling Add kernel...
Compiling Mul kernel...
Compiling Mad kernel...
Compiling Div kernel...
Compute_Float_Chain: Online compilation test with 524288 instances, each with 300 iterations
Running Add kernel...
Running Mul kernel...
Running Mad kernel...
Running Div kernel...
---------------
Float Add:
---------------
Verify_Float: Time                      0.409995s
Verify_Float: Iterations                300
Verify_Float: Instances                 524288
Verify_Float: Floating point operations 15417212928
Verify_Float: GFlop                     15.417213
Verify_Float: GFLOP/S                   37.603417
---------------
Float Mul:
---------------
Verify_Float: Time                      0.409922s
Verify_Float: Iterations                300
Verify_Float: Instances                 524288
Verify_Float: Floating point operations 15416164352
Verify_Float: GFlop                     15.416164
Verify_Float: GFLOP/S                   37.607555
---------------
Float Mad:
---------------
Verify_Float: Time                      0.409930s
Verify_Float: Iterations                300
Verify_Float: Instances                 524288
Verify_Float: Floating point operations 30830231552
Verify_Float: GFlop                     30.830232
Verify_Float: GFLOP/S                   75.208527
---------------
Float Div:
---------------
Verify_Float: Time                      0.820185s
Verify_Float: Iterations                300
Verify_Float: Instances                 524288
Verify_Float: Floating point operations 15416164352
Verify_Float: GFlop                     15.416164
Verify_Float: GFLOP/S                   18.795960
---------------
Verify_Float: All tests time                      2.173687s
---------------
float_chain --> passed
float_chain: Test took 4.19 seconds to run:
float_chain: Init 0.14s (3.39%)  Compute 2.25s (53.81%)Verify 1.79s (42.80)%
17******************************************************************************
Floating Point Operations Kernels for all vector sizes:
	Performs an online compilation of floating point add/mul/mad/div
	kernels which each perform a large number of one specific operation,
	and calculates the floating point operations per second of the
	device. This test uses float{1,2,3,4,8,16} as its unit of computation.
********************************************************************************
Compiling Add float kernel...
Compiling Mul float kernel...
Compiling Mad float kernel...
Compiling Div float kernel...
Init_Floatops: Reducing from 65536 to 65536 due to use of float in kernel.
Compiling Add float2 kernel...
Compiling Mul float2 kernel...
Compiling Mad float2 kernel...
Compiling Div float2 kernel...
Init_Floatops: Reducing from 65536 to 65536 due to use of float2 in kernel.
Compiling Add float3 kernel...
Compiling Mul float3 kernel...
Compiling Mad float3 kernel...
Compiling Div float3 kernel...
Init_Floatops: Reducing from 65536 to 65536 due to use of float3 in kernel.
Compiling Add float4 kernel...
Compiling Mul float4 kernel...
Compiling Mad float4 kernel...
Compiling Div float4 kernel...
Init_Floatops: Reducing from 65536 to 65536 due to use of float4 in kernel.
Compiling Add float8 kernel...
Compiling Mul float8 kernel...
Compiling Mad float8 kernel...
Compiling Div float8 kernel...
Init_Floatops: Reducing from 65536 to 65536 due to use of float8 in kernel.
Compiling Add float16 kernel...
Compiling Mul float16 kernel...
Compiling Mad float16 kernel...
Compiling Div float16 kernel...
Init_Floatops: Reducing from 65536 to 65536 due to use of float16 in kernel.
Compute_Floatops: Online compilation test with float using 65536 instances, each with 400 iterations
Running Add kernel for float...
Running Mul kernel for float...
Running Mad kernel for float...
Running Div kernel for float...
Compute_Floatops: Online compilation test with float2 using 65536 instances, each with 400 iterations
Running Add kernel for float2...
Running Mul kernel for float2...
Running Mad kernel for float2...
Running Div kernel for float2...
Compute_Floatops: Online compilation test with float3 using 65536 instances, each with 400 iterations
Running Add kernel for float3...
Running Mul kernel for float3...
Running Mad kernel for float3...
Running Div kernel for float3...
Compute_Floatops: Online compilation test with float4 using 65536 instances, each with 400 iterations
Running Add kernel for float4...
Running Mul kernel for float4...
Running Mad kernel for float4...
Running Div kernel for float4...
Compute_Floatops: Online compilation test with float8 using 65536 instances, each with 400 iterations
Running Add kernel for float8...
Running Mul kernel for float8...
Running Mad kernel for float8...
Running Div kernel for float8...
Compute_Floatops: Online compilation test with float16 using 65536 instances, each with 400 iterations
Running Add kernel for float16...
Running Mul kernel for float16...
Running Mad kernel for float16...
Running Div kernel for float16...
---------------
float Add:
---------------
Verify_Floatops: Time                      0.068629s
Verify_Floatops: Iterations                400
Verify_Floatops: Instances                 65536
Verify_Floatops: Float operations 2569404416
Verify_Floatops: GFLOP                     2.569404
Verify_Floatops: GFLOP/S                   37.439048
---------------
float Mul:
---------------
Verify_Floatops: Time                      0.068905s
Verify_Floatops: Iterations                400
Verify_Floatops: Instances                 65536
Verify_Floatops: Float operations 2569273344
Verify_Floatops: GFLOP                     2.569273
Verify_Floatops: GFLOP/S                   37.287183
---------------
float Mad:
---------------
Verify_Floatops: Time                      0.070338s
Verify_Floatops: Iterations                400
Verify_Floatops: Instances                 65536
Verify_Floatops: Float operations 5138284544
Verify_Floatops: GFLOP                     5.138285
Verify_Floatops: GFLOP/S                   73.051331
---------------
float Div:
---------------
Verify_Floatops: Time                      0.136826s
Verify_Floatops: Iterations                400
Verify_Floatops: Instances                 65536
Verify_Floatops: Float operations 2569273344
Verify_Floatops: GFLOP                     2.569273
Verify_Floatops: GFLOP/S                   18.777669

---------------
float2 Add:
---------------
Verify_Floatops: Time                      0.135482s
Verify_Floatops: Iterations                400
Verify_Floatops: Instances                 65536
Verify_Floatops: Float operations 5138808832
Verify_Floatops: GFLOP                     5.138809
Verify_Floatops: GFLOP/S                   37.929827
---------------
float2 Mul:
---------------
Verify_Floatops: Time                      0.135580s
Verify_Floatops: Iterations                400
Verify_Floatops: Instances                 65536
Verify_Floatops: Float operations 5138546688
Verify_Floatops: GFLOP                     5.138547
Verify_Floatops: GFLOP/S                   37.900477
---------------
float2 Mad:
---------------
Verify_Floatops: Time                      0.135984s
Verify_Floatops: Iterations                400
Verify_Floatops: Instances                 65536
Verify_Floatops: Float operations 10276569088
Verify_Floatops: GFLOP                     10.276569
Verify_Floatops: GFLOP/S                   75.571899
---------------
float2 Div:
---------------
Verify_Floatops: Time                      0.272034s
Verify_Floatops: Iterations                400
Verify_Floatops: Instances                 65536
Verify_Floatops: Float operations 5138546688
Verify_Floatops: GFLOP                     5.138547
Verify_Floatops: GFLOP/S                   18.889355

---------------
float3 Add:
---------------
Verify_Floatops: Time                      0.210914s
Verify_Floatops: Iterations                400
Verify_Floatops: Instances                 65536
Verify_Floatops: Float operations 7708213248
Verify_Floatops: GFLOP                     7.708213
Verify_Floatops: GFLOP/S                   36.546712
---------------
float3 Mul:
---------------
Verify_Floatops: Time                      0.203259s
Verify_Floatops: Iterations                400
Verify_Floatops: Instances                 65536
Verify_Floatops: Float operations 7707820032
Verify_Floatops: GFLOP                     7.707820
Verify_Floatops: GFLOP/S                   37.921175
---------------
float3 Mad:
---------------
Verify_Floatops: Time                      0.203220s
Verify_Floatops: Iterations                400
Verify_Floatops: Instances                 65536
Verify_Floatops: Float operations 15414853632
Verify_Floatops: GFLOP                     15.414854
Verify_Floatops: GFLOP/S                   75.853034
---------------
float3 Div:
---------------
Verify_Floatops: Time                      0.408912s
Verify_Floatops: Iterations                400
Verify_Floatops: Instances                 65536
Verify_Floatops: Float operations 7707820032
Verify_Floatops: GFLOP                     7.707820
Verify_Floatops: GFLOP/S                   18.849581

---------------
float4 Add:
---------------
Verify_Floatops: Time                      0.279024s
Verify_Floatops: Iterations                400
Verify_Floatops: Instances                 65536
Verify_Floatops: Float operations 10277617664
Verify_Floatops: GFLOP                     10.277618
Verify_Floatops: GFLOP/S                   36.834171
---------------
float4 Mul:
---------------
Verify_Floatops: Time                      0.270386s
Verify_Floatops: Iterations                400
Verify_Floatops: Instances                 65536
Verify_Floatops: Float operations 10277093376
Verify_Floatops: GFLOP                     10.277093
Verify_Floatops: GFLOP/S                   38.008970
---------------
float4 Mad:
---------------
Verify_Floatops: Time                      0.270518s
Verify_Floatops: Iterations                400
Verify_Floatops: Instances                 65536
Verify_Floatops: Float operations 20553138176
Verify_Floatops: GFLOP                     20.553138
Verify_Floatops: GFLOP/S                   75.976971
---------------
float4 Div:
---------------
Verify_Floatops: Time                      0.543566s
Verify_Floatops: Iterations                400
Verify_Floatops: Instances                 65536
Verify_Floatops: Float operations 10277093376
Verify_Floatops: GFLOP                     10.277093
Verify_Floatops: GFLOP/S                   18.906799

---------------
float8 Add:
---------------
Verify_Floatops: Time                      0.600531s
Verify_Floatops: Iterations                400
Verify_Floatops: Instances                 65536
Verify_Floatops: Float operations 20555235328
Verify_Floatops: GFLOP                     20.555235
Verify_Floatops: GFLOP/S                   34.228433
---------------
float8 Mul:
---------------
Verify_Floatops: Time                      0.538501s
Verify_Floatops: Iterations                400
Verify_Floatops: Instances                 65536
Verify_Floatops: Float operations 20554186752
Verify_Floatops: GFLOP                     20.554187
Verify_Floatops: GFLOP/S                   38.169264
---------------
float8 Mad:
---------------
Verify_Floatops: Time                      0.543278s
Verify_Floatops: Iterations                400
Verify_Floatops: Instances                 65536
Verify_Floatops: Float operations 41106276352
Verify_Floatops: GFLOP                     41.106276
Verify_Floatops: GFLOP/S                   75.663429
---------------
float8 Div:
---------------
Verify_Floatops: Time                      1.091137s
Verify_Floatops: Iterations                400
Verify_Floatops: Instances                 65536
Verify_Floatops: Float operations 20554186752
Verify_Floatops: GFLOP                     20.554187
Verify_Floatops: GFLOP/S                   18.837402

---------------
float16 Add:
---------------
Verify_Floatops: Time                      1.228492s
Verify_Floatops: Iterations                400
Verify_Floatops: Instances                 65536
Verify_Floatops: Float operations 41110470656
Verify_Floatops: GFLOP                     41.110471
Verify_Floatops: GFLOP/S                   33.464174
---------------
float16 Mul:
---------------
Verify_Floatops: Time                      1.075268s
Verify_Floatops: Iterations                400
Verify_Floatops: Instances                 65536
Verify_Floatops: Float operations 41108373504
Verify_Floatops: GFLOP                     41.108374
Verify_Floatops: GFLOP/S                   38.230816
---------------
float16 Mad:
---------------
Verify_Floatops: Time                      1.088152s
Verify_Floatops: Iterations                400
Verify_Floatops: Instances                 65536
Verify_Floatops: Float operations 82212552704
Verify_Floatops: GFLOP                     82.212553
Verify_Floatops: GFLOP/S                   75.552453
---------------
float16 Div:
---------------
Verify_Floatops: Time                      2.190075s
Verify_Floatops: Iterations                400
Verify_Floatops: Instances                 65536
Verify_Floatops: Float operations 41108373504
Verify_Floatops: GFLOP                     41.108374
Verify_Floatops: GFLOP/S                   18.770304

floatvec --> passed
floatvec: Test took 42.24 seconds to run:
floatvec: Init 19.01s (44.99%)  Compute 13.70s (32.44%)Verify 9.53s (22.56)%
18******************************************************************************
Integer Operations Kernels:
	Performs an online compilation of integer add/mul/mad/div
	kernels which each perform a large number of one specific
	operation, and calculates the operations per second of the
	device. This test uses int{1,2,3,4} as its unit of computation.
********************************************************************************
Compiling Add int kernel...
Compiling Mul int kernel...
Compiling Mad int kernel...
Compiling Div int kernel...
Compiling Add int2 kernel...
Compiling Mul int2 kernel...
Compiling Mad int2 kernel...
Compiling Div int2 kernel...
Compiling Add int3 kernel...
Compiling Mul int3 kernel...
Compiling Mad int3 kernel...
Compiling Div int3 kernel...
Compiling Add int4 kernel...
Compiling Mul int4 kernel...
Compiling Mad int4 kernel...
Compiling Div int4 kernel...
Compute_Int: Online compilation test with int using 65536 instances, each with 400 iterations
Running Add kernel for int...
Running Mul kernel for int...
Running Mad kernel for int...
Running Div kernel for int...
Compute_Int: Online compilation test with int2 using 65536 instances, each with 400 iterations
Running Add kernel for int2...
Running Mul kernel for int2...
Running Mad kernel for int2...
Running Div kernel for int2...
Compute_Int: Online compilation test with int3 using 65536 instances, each with 400 iterations
Running Add kernel for int3...
Running Mul kernel for int3...
Running Mad kernel for int3...
Running Div kernel for int3...
Compute_Int: Online compilation test with int4 using 65536 instances, each with 400 iterations
Running Add kernel for int4...
Running Mul kernel for int4...
Running Mad kernel for int4...
Running Div kernel for int4...
---------------
int Add:
---------------
Verify_Int: Time                      0.048407s
Verify_Int: Iterations                400
Verify_Int: Instances                 65536
Verify_Int: Integer operations 996540416
Verify_Int: GIOp                     0.996540
Verify_Int: GIOP/S                   20.586701
---------------
int Mul:
---------------
Verify_Int: Time                      0.044155s
Verify_Int: Iterations                400
Verify_Int: Instances                 65536
Verify_Int: Integer operations 996409344
Verify_Int: GIOp                     0.996409
Verify_Int: GIOP/S                   22.566172
---------------
int Mad:
---------------
Verify_Int: Time                      0.056313s
Verify_Int: Iterations                400
Verify_Int: Instances                 65536
Verify_Int: Integer operations 2097414144
Verify_Int: GIOp                     2.097414
Verify_Int: GIOP/S                   37.245647
---------------
int Div:
---------------
Verify_Int: Time                      3.138706s
Verify_Int: Iterations                400
Verify_Int: Instances                 65536
Verify_Int: Integer operations 996409344
Verify_Int: GIOp                     0.996409
Verify_Int: GIOP/S                   0.317459

---------------
int2 Add:
---------------
Verify_Int: Time                      0.094852s
Verify_Int: Iterations                400
Verify_Int: Instances                 65536
Verify_Int: Integer operations 1993080832
Verify_Int: GIOp                     1.993081
Verify_Int: GIOP/S                   21.012534
---------------
int2 Mul:
---------------
Verify_Int: Time                      0.086736s
Verify_Int: Iterations                400
Verify_Int: Instances                 65536
Verify_Int: Integer operations 1992818688
Verify_Int: GIOp                     1.992819
Verify_Int: GIOP/S                   22.975681
---------------
int2 Mad:
---------------
Verify_Int: Time                      0.110893s
Verify_Int: Iterations                400
Verify_Int: Instances                 65536
Verify_Int: Integer operations 4194828288
Verify_Int: GIOp                     4.194828
Verify_Int: GIOP/S                   37.827710
---------------
int2 Div:
---------------
Verify_Int: Time                      6.437046s
Verify_Int: Iterations                400
Verify_Int: Instances                 65536
Verify_Int: Integer operations 1992818688
Verify_Int: GIOp                     1.992819
Verify_Int: GIOP/S                   0.309586

---------------
int3 Add:
---------------
Verify_Int: Time                      0.141097s
Verify_Int: Iterations                400
Verify_Int: Instances                 65536
Verify_Int: Integer operations 2989621248
Verify_Int: GIOp                     2.989621
Verify_Int: GIOP/S                   21.188411
---------------
int3 Mul:
---------------
Verify_Int: Time                      0.128773s
Verify_Int: Iterations                400
Verify_Int: Instances                 65536
Verify_Int: Integer operations 2989228032
Verify_Int: GIOp                     2.989228
Verify_Int: GIOP/S                   23.213158
---------------
int3 Mad:
---------------
Verify_Int: Time                      0.166180s
Verify_Int: Iterations                400
Verify_Int: Instances                 65536
Verify_Int: Integer operations 6292242432
Verify_Int: GIOp                     6.292242
Verify_Int: GIOP/S                   37.864018
---------------
int3 Div:
---------------
Verify_Int: Time                      9.671746s
Verify_Int: Iterations                400
Verify_Int: Instances                 65536
Verify_Int: Integer operations 2989228032
Verify_Int: GIOp                     2.989228
Verify_Int: GIOP/S                   0.309068

---------------
int4 Add:
---------------
Verify_Int: Time                      0.187724s
Verify_Int: Iterations                400
Verify_Int: Instances                 65536
Verify_Int: Integer operations 3986161664
Verify_Int: GIOp                     3.986162
Verify_Int: GIOP/S                   21.234161
---------------
int4 Mul:
---------------
Verify_Int: Time                      0.171712s
Verify_Int: Iterations                400
Verify_Int: Instances                 65536
Verify_Int: Integer operations 3985637376
Verify_Int: GIOp                     3.985637
Verify_Int: GIOP/S                   23.211176
---------------
int4 Mad:
---------------
Verify_Int: Time                      0.220862s
Verify_Int: Iterations                400
Verify_Int: Instances                 65536
Verify_Int: Integer operations 8389656576
Verify_Int: GIOp                     8.389657
Verify_Int: GIOP/S                   37.985967
---------------
int4 Div:
---------------
Verify_Int: Time                      12.905239s
Verify_Int: Iterations                400
Verify_Int: Instances                 65536
Verify_Int: Integer operations 3985637376
Verify_Int: GIOp                     3.985637
Verify_Int: GIOP/S                   0.308839

int --> passed
int: Test took 76.98 seconds to run:
int: Init 37.45s (48.65%)  Compute 34.50s (44.81%)Verify 5.03s (6.53)%
19******************************************************************************
Integer Operations Kernels:
	Performs an online compilation of integer modulo
	kernels which each perform a large number of one specific
	operation, and calculates the operations per second of the
	device. This test uses int{1,2,3,4} as its unit of computation.
********************************************************************************
Compiling Div uint kernel...
Compiling Div Fast uint kernel...
Compiling Div uint2 kernel...
Compiling Div Fast uint2 kernel...
Compiling Div uint3 kernel...
Compiling Div Fast uint3 kernel...
Compiling Div uint4 kernel...
Compiling Div Fast uint4 kernel...
Compute_IntMod: Online compilation test with uint using 65536 instances, each with 400 iterations
Running Div kernel for uint...
Running Div kernel Fast for uint...
Compute_IntMod: Online compilation test with uint2 using 65536 instances, each with 400 iterations
Running Div kernel for uint2...
Running Div kernel Fast for uint2...
Compute_IntMod: Online compilation test with uint3 using 65536 instances, each with 400 iterations
Running Div kernel for uint3...
Running Div kernel Fast for uint3...
Compute_IntMod: Online compilation test with uint4 using 65536 instances, each with 400 iterations
Running Div kernel for uint4...
Running Div kernel Fast for uint4...
---------------
uint Div:
---------------
Verify_IntMod: Time                      0.206417s
Verify_IntMod: Iterations                400
Verify_IntMod: Instances                 65536
Verify_IntMod: Integer operations 1520697344
Verify_IntMod: GIOp                     1.520697
Verify_IntMod: GIOP/S                   7.367113
---------------
uint Div fast:
---------------
Verify_IntMod: Time                      0.206099s
Verify_IntMod: Iterations                400
Verify_IntMod: Instances                 65536
Verify_IntMod: Integer operations 1520697344
Verify_IntMod: GIOp                     1.520697
Verify_IntMod: GIOP/S                   7.378480

---------------
uint2 Div:
---------------
Verify_IntMod: Time                      0.462457s
Verify_IntMod: Iterations                400
Verify_IntMod: Instances                 65536
Verify_IntMod: Integer operations 3041394688
Verify_IntMod: GIOp                     3.041395
Verify_IntMod: GIOP/S                   6.576600
---------------
uint2 Div fast:
---------------
Verify_IntMod: Time                      0.458995s
Verify_IntMod: Iterations                400
Verify_IntMod: Instances                 65536
Verify_IntMod: Integer operations 3041394688
Verify_IntMod: GIOp                     3.041395
Verify_IntMod: GIOP/S                   6.626204

---------------
uint3 Div:
---------------
Verify_IntMod: Time                      0.692772s
Verify_IntMod: Iterations                400
Verify_IntMod: Instances                 65536
Verify_IntMod: Integer operations 4562092032
Verify_IntMod: GIOp                     4.562092
Verify_IntMod: GIOP/S                   6.585272
---------------
uint3 Div fast:
---------------
Verify_IntMod: Time                      0.690712s
Verify_IntMod: Iterations                400
Verify_IntMod: Instances                 65536
Verify_IntMod: Integer operations 4562092032
Verify_IntMod: GIOp                     4.562092
Verify_IntMod: GIOP/S                   6.604912

---------------
uint4 Div:
---------------
Verify_IntMod: Time                      0.938431s
Verify_IntMod: Iterations                400
Verify_IntMod: Instances                 65536
Verify_IntMod: Integer operations 6082789376
Verify_IntMod: GIOp                     6.082789
Verify_IntMod: GIOP/S                   6.481872
---------------
uint4 Div fast:
---------------
Verify_IntMod: Time                      0.949093s
Verify_IntMod: Iterations                400
Verify_IntMod: Instances                 65536
Verify_IntMod: Integer operations 6082789376
Verify_IntMod: GIOp                     6.082789
Verify_IntMod: GIOP/S                   6.409055

intmod --> passed
intmod: Test took 24.13 seconds to run:
intmod: Init 16.80s (69.61%)  Compute 4.97s (20.58%)Verify 2.37s (9.81)%
20******************************************************************************
Short Operations Kernels:
	Performs an online compilation of short add/mul/mad/div
	kernels which each perform a large number of one specific
	operation, and calculates the operations per second of the
	device. This test uses short{1,2,3,4} as its unit of computation.
********************************************************************************
Compiling Add short kernel...
Compiling Mul short kernel...
Compiling Mad short kernel...
Compiling Div short kernel...
Compiling Add short2 kernel...
Compiling Mul short2 kernel...
Compiling Mad short2 kernel...
Compiling Div short2 kernel...
Compiling Add short3 kernel...
Compiling Mul short3 kernel...
Compiling Mad short3 kernel...
Compiling Div short3 kernel...
Compiling Add short4 kernel...
Compiling Mul short4 kernel...
Compiling Mad short4 kernel...
Compiling Div short4 kernel...
Compute_Short: Online compilation test with short using 65536 instances, each with 400 iterations
Running Add kernel for short...
Running Mul kernel for short...
Running Mad kernel for short...
Running Div kernel for short...
Compute_Short: Online compilation test with short2 using 65536 instances, each with 400 iterations
Running Add kernel for short2...
Running Mul kernel for short2...
Running Mad kernel for short2...
Running Div kernel for short2...
Compute_Short: Online compilation test with short3 using 65536 instances, each with 400 iterations
Running Add kernel for short3...
Running Mul kernel for short3...
Running Mad kernel for short3...
Running Div kernel for short3...
Compute_Short: Online compilation test with short4 using 65536 instances, each with 400 iterations
Running Add kernel for short4...
Running Mul kernel for short4...
Running Mad kernel for short4...
Running Div kernel for short4...
---------------
short Add:
---------------
Verify_Short: Time                      0.048165s
Verify_Short: Iterations                400
Verify_Short: Instances                 65536
Verify_Short: Short operations 996540416
Verify_Short: GIOp                     0.996540
Verify_Short: GIOP/S                   20.690136
---------------
short Mul:
---------------
Verify_Short: Time                      0.044276s
Verify_Short: Iterations                400
Verify_Short: Instances                 65536
Verify_Short: Short operations 996409344
Verify_Short: GIOp                     0.996409
Verify_Short: GIOP/S                   22.504502
---------------
short Mad:
---------------
Verify_Short: Time                      0.056554s
Verify_Short: Iterations                400
Verify_Short: Instances                 65536
Verify_Short: Short operations 2097414144
Verify_Short: GIOp                     2.097414
Verify_Short: GIOP/S                   37.086928
---------------
short Div:
---------------
Verify_Short: Time                      0.261235s
Verify_Short: Iterations                400
Verify_Short: Instances                 65536
Verify_Short: Short operations 996409344
Verify_Short: GIOp                     0.996409
Verify_Short: GIOP/S                   3.814226

---------------
short2 Add:
---------------
Verify_Short: Time                      0.095012s
Verify_Short: Iterations                400
Verify_Short: Instances                 65536
Verify_Short: Short operations 1993080832
Verify_Short: GIOp                     1.993081
Verify_Short: GIOP/S                   20.977148
---------------
short2 Mul:
---------------
Verify_Short: Time                      0.086786s
Verify_Short: Iterations                400
Verify_Short: Instances                 65536
Verify_Short: Short operations 1992818688
Verify_Short: GIOp                     1.992819
Verify_Short: GIOP/S                   22.962444
---------------
short2 Mad:
---------------
Verify_Short: Time                      0.111290s
Verify_Short: Iterations                400
Verify_Short: Instances                 65536
Verify_Short: Short operations 4194828288
Verify_Short: GIOp                     4.194828
Verify_Short: GIOP/S                   37.692769
---------------
short2 Div:
---------------
Verify_Short: Time                      0.418234s
Verify_Short: Iterations                400
Verify_Short: Instances                 65536
Verify_Short: Short operations 1992818688
Verify_Short: GIOp                     1.992819
Verify_Short: GIOP/S                   4.764841

---------------
short3 Add:
---------------
Verify_Short: Time                      0.141091s
Verify_Short: Iterations                400
Verify_Short: Instances                 65536
Verify_Short: Short operations 2989621248
Verify_Short: GIOp                     2.989621
Verify_Short: GIOP/S                   21.189312
---------------
short3 Mul:
---------------
Verify_Short: Time                      0.129238s
Verify_Short: Iterations                400
Verify_Short: Instances                 65536
Verify_Short: Short operations 2989228032
Verify_Short: GIOp                     2.989228
Verify_Short: GIOP/S                   23.129637
---------------
short3 Mad:
---------------
Verify_Short: Time                      0.165615s
Verify_Short: Iterations                400
Verify_Short: Instances                 65536
Verify_Short: Short operations 6292242432
Verify_Short: GIOp                     6.292242
Verify_Short: GIOP/S                   37.993192
---------------
short3 Div:
---------------
Verify_Short: Time                      0.626321s
Verify_Short: Iterations                400
Verify_Short: Instances                 65536
Verify_Short: Short operations 2989228032
Verify_Short: GIOp                     2.989228
Verify_Short: GIOP/S                   4.772677

---------------
short4 Add:
---------------
Verify_Short: Time                      0.188247s
Verify_Short: Iterations                400
Verify_Short: Instances                 65536
Verify_Short: Short operations 3986161664
Verify_Short: GIOp                     3.986162
Verify_Short: GIOP/S                   21.175167
---------------
short4 Mul:
---------------
Verify_Short: Time                      0.171203s
Verify_Short: Iterations                400
Verify_Short: Instances                 65536
Verify_Short: Short operations 3985637376
Verify_Short: GIOp                     3.985637
Verify_Short: GIOP/S                   23.280184
---------------
short4 Mad:
---------------
Verify_Short: Time                      0.220319s
Verify_Short: Iterations                400
Verify_Short: Instances                 65536
Verify_Short: Short operations 8389656576
Verify_Short: GIOp                     8.389657
Verify_Short: GIOP/S                   38.079587
---------------
short4 Div:
---------------
Verify_Short: Time                      0.834570s
Verify_Short: Iterations                400
Verify_Short: Instances                 65536
Verify_Short: Short operations 3985637376
Verify_Short: GIOp                     3.985637
Verify_Short: GIOP/S                   4.775678

short --> passed
short: Test took 19.14 seconds to run:
short: Init 9.86s (51.52%)  Compute 4.28s (22.38%)Verify 5.00s (26.11)%
21******************************************************************************
Short Operations Kernels:
	Performs an online compilation of short modulo
	kernels which each perform a large number of one specific
	operation, and calculates the operations per second of the
	device. This test uses short{1,2,3,4} as its unit of computation.
********************************************************************************
Compiling Div ushort kernel...
Compiling Div Fast ushort kernel...
Compiling Div ushort2 kernel...
Compiling Div Fast ushort2 kernel...
Compiling Div ushort3 kernel...
Compiling Div Fast ushort3 kernel...
Compiling Div ushort4 kernel...
Compiling Div Fast ushort4 kernel...
Compute_ShortMod: Online compilation test with ushort using 65536 instances, each with 400 iterations
Running Div kernel for ushort...
Running Div kernel Fast for ushort...
Compute_ShortMod: Online compilation test with ushort2 using 65536 instances, each with 400 iterations
Running Div kernel for ushort2...
Running Div kernel Fast for ushort2...
Compute_ShortMod: Online compilation test with ushort3 using 65536 instances, each with 400 iterations
Running Div kernel for ushort3...
Running Div kernel Fast for ushort3...
Compute_ShortMod: Online compilation test with ushort4 using 65536 instances, each with 400 iterations
Running Div kernel for ushort4...
Running Div kernel Fast for ushort4...
---------------
ushort Div:
---------------
Verify_ShortMod: Time                      0.482448s
Verify_ShortMod: Iterations                400
Verify_ShortMod: Instances                 65536
Verify_ShortMod: Integer operations 1520697344
Verify_ShortMod: GIOp                     1.520697
Verify_ShortMod: GIOP/S                   3.152044
---------------
ushort Div fast:
---------------
Verify_ShortMod: Time                      0.242209s
Verify_ShortMod: Iterations                400
Verify_ShortMod: Instances                 65536
Verify_ShortMod: Integer operations 1520697344
Verify_ShortMod: GIOp                     1.520697
Verify_ShortMod: GIOP/S                   6.278451

---------------
ushort2 Div:
---------------
Verify_ShortMod: Time                      0.936468s
Verify_ShortMod: Iterations                400
Verify_ShortMod: Instances                 65536
Verify_ShortMod: Integer operations 3041394688
Verify_ShortMod: GIOp                     3.041395
Verify_ShortMod: GIOP/S                   3.247729
---------------
ushort2 Div fast:
---------------
Verify_ShortMod: Time                      0.481168s
Verify_ShortMod: Iterations                400
Verify_ShortMod: Instances                 65536
Verify_ShortMod: Integer operations 3041394688
Verify_ShortMod: GIOp                     3.041395
Verify_ShortMod: GIOP/S                   6.320858

---------------
ushort3 Div:
---------------
Verify_ShortMod: Time                      1.518535s
Verify_ShortMod: Iterations                400
Verify_ShortMod: Instances                 65536
Verify_ShortMod: Integer operations 4562092032
Verify_ShortMod: GIOp                     4.562092
Verify_ShortMod: GIOP/S                   3.004272
---------------
ushort3 Div fast:
---------------
Verify_ShortMod: Time                      0.731963s
Verify_ShortMod: Iterations                400
Verify_ShortMod: Instances                 65536
Verify_ShortMod: Integer operations 4562092032
Verify_ShortMod: GIOp                     4.562092
Verify_ShortMod: GIOP/S                   6.232681

---------------
ushort4 Div:
---------------
Verify_ShortMod: Time                      2.163825s
Verify_ShortMod: Iterations                400
Verify_ShortMod: Instances                 65536
Verify_ShortMod: Integer operations 6082789376
Verify_ShortMod: GIOp                     6.082789
Verify_ShortMod: GIOP/S                   2.811128
---------------
ushort4 Div fast:
---------------
Verify_ShortMod: Time                      1.011028s
Verify_ShortMod: Iterations                400
Verify_ShortMod: Instances                 65536
Verify_ShortMod: Integer operations 6082789376
Verify_ShortMod: GIOp                     6.082789
Verify_ShortMod: GIOP/S                   6.016440

shortmod --> passed
shortmod: Test took 26.41 seconds to run:
shortmod: Init 15.94s (60.36%)  Compute 7.95s (30.11%)Verify 2.52s (9.53)%
22******************************************************************************
Char Operations Kernels:
	Performs an online compilation of char add/mul/mad/div
	kernels which each perform a large number of one specific
	operation, and calculates the operations per second of the
	device. This test uses char{1,2,3,4} as its unit of computation.
********************************************************************************
Compiling Add char kernel...
Compiling Mul char kernel...
Compiling Mad char kernel...
Compiling Div char kernel...
Compiling Add char2 kernel...
Compiling Mul char2 kernel...
Compiling Mad char2 kernel...
Compiling Div char2 kernel...
Compiling Add char3 kernel...
Compiling Mul char3 kernel...
Compiling Mad char3 kernel...
Compiling Div char3 kernel...
Compiling Add char4 kernel...
Compiling Mul char4 kernel...
Compiling Mad char4 kernel...
Compiling Div char4 kernel...
Compute_Char: Online compilation test with char using 65536 instances, each with 400 iterations
Running Add kernel for char...
Running Mul kernel for char...
Running Mad kernel for char...
Running Div kernel for char...
Compute_Char: Online compilation test with char2 using 65536 instances, each with 400 iterations
Running Add kernel for char2...
Running Mul kernel for char2...
Running Mad kernel for char2...
Running Div kernel for char2...
Compute_Char: Online compilation test with char3 using 65536 instances, each with 400 iterations
Running Add kernel for char3...
Running Mul kernel for char3...
Running Mad kernel for char3...
Running Div kernel for char3...
Compute_Char: Online compilation test with char4 using 65536 instances, each with 400 iterations
Running Add kernel for char4...
Running Mul kernel for char4...
Running Mad kernel for char4...
Running Div kernel for char4...
---------------
char Add:
---------------
Verify_Char: Time                      0.048132s
Verify_Char: Iterations                400
Verify_Char: Instances                 65536
Verify_Char: Char operations 996540416
Verify_Char: GIOp                     0.996540
Verify_Char: GIOP/S                   20.704322
---------------
char Mul:
---------------
Verify_Char: Time                      0.031758s
Verify_Char: Iterations                400
Verify_Char: Instances                 65536
Verify_Char: Char operations 681836544
Verify_Char: GIOp                     0.681837
Verify_Char: GIOP/S                   21.469757
---------------
char Mad:
---------------
Verify_Char: Time                      0.056299s
Verify_Char: Iterations                400
Verify_Char: Instances                 65536
Verify_Char: Char operations 2097414144
Verify_Char: GIOp                     2.097414
Verify_Char: GIOP/S                   37.254909
---------------
char Div:
---------------
Verify_Char: Time                      0.261175s
Verify_Char: Iterations                400
Verify_Char: Instances                 65536
Verify_Char: Char operations 996409344
Verify_Char: GIOp                     0.996409
Verify_Char: GIOP/S                   3.815102

---------------
char2 Add:
---------------
Verify_Char: Time                      0.095022s
Verify_Char: Iterations                400
Verify_Char: Instances                 65536
Verify_Char: Char operations 1993080832
Verify_Char: GIOp                     1.993081
Verify_Char: GIOP/S                   20.974941
---------------
char2 Mul:
---------------
Verify_Char: Time                      0.061869s
Verify_Char: Iterations                400
Verify_Char: Instances                 65536
Verify_Char: Char operations 1363673088
Verify_Char: GIOp                     1.363673
Verify_Char: GIOP/S                   22.041298
---------------
char2 Mad:
---------------
Verify_Char: Time                      0.110911s
Verify_Char: Iterations                400
Verify_Char: Instances                 65536
Verify_Char: Char operations 4194828288
Verify_Char: GIOp                     4.194828
Verify_Char: GIOP/S                   37.821571
---------------
char2 Div:
---------------
Verify_Char: Time                      0.519456s
Verify_Char: Iterations                400
Verify_Char: Instances                 65536
Verify_Char: Char operations 1992818688
Verify_Char: GIOp                     1.992819
Verify_Char: GIOP/S                   3.836357

---------------
char3 Add:
---------------
Verify_Char: Time                      0.141104s
Verify_Char: Iterations                400
Verify_Char: Instances                 65536
Verify_Char: Char operations 2989621248
Verify_Char: GIOp                     2.989621
Verify_Char: GIOP/S                   21.187360
---------------
char3 Mul:
---------------
Verify_Char: Time                      0.091806s
Verify_Char: Iterations                400
Verify_Char: Instances                 65536
Verify_Char: Char operations 2045509632
Verify_Char: GIOp                     2.045510
Verify_Char: GIOP/S                   22.280784
---------------
char3 Mad:
---------------
Verify_Char: Time                      0.165578s
Verify_Char: Iterations                400
Verify_Char: Instances                 65536
Verify_Char: Char operations 6292242432
Verify_Char: GIOp                     6.292242
Verify_Char: GIOP/S                   38.001682
---------------
char3 Div:
---------------
Verify_Char: Time                      0.627681s
Verify_Char: Iterations                400
Verify_Char: Instances                 65536
Verify_Char: Char operations 2989228032
Verify_Char: GIOp                     2.989228
Verify_Char: GIOP/S                   4.762336

---------------
char4 Add:
---------------
Verify_Char: Time                      0.188343s
Verify_Char: Iterations                400
Verify_Char: Instances                 65536
Verify_Char: Char operations 3986161664
Verify_Char: GIOp                     3.986162
Verify_Char: GIOP/S                   21.164374
---------------
char4 Mul:
---------------
Verify_Char: Time                      0.121852s
Verify_Char: Iterations                400
Verify_Char: Instances                 65536
Verify_Char: Char operations 2727346176
Verify_Char: GIOp                     2.727346
Verify_Char: GIOP/S                   22.382449
---------------
char4 Mad:
---------------
Verify_Char: Time                      0.220987s
Verify_Char: Iterations                400
Verify_Char: Instances                 65536
Verify_Char: Char operations 8389656576
Verify_Char: GIOp                     8.389657
Verify_Char: GIOP/S                   37.964480
---------------
char4 Div:
---------------
Verify_Char: Time                      0.835537s
Verify_Char: Iterations                400
Verify_Char: Instances                 65536
Verify_Char: Char operations 3985637376
Verify_Char: GIOp                     3.985637
Verify_Char: GIOP/S                   4.770151

char --> passed
char: Test took 19.13 seconds to run:
char: Init 9.78s (51.11%)  Compute 4.30s (22.45%)Verify 5.06s (26.43)%
23******************************************************************************
Device Transfer Test:
	Performs purely transfer operations using an OpenCL buffer from:
	 host   --> device (Upload)
	 device --> device (Copy)
	 device --> host   (Download)
It reports the average speed of each operation on the device.
********************************************************************************
Init_Transfer: Random seed set to 1507279508 
Compute_Transfer: Starting transfer operations ... , 4MB, 4MB, 4MB, 4MB, 5MB, 5MB, 6MB, 7MB, 8MB, 9MB, 10MB, 12MB, 16MB, 21MB, 32MB, 64MB
Compute_Transfer: Transfer tests complete.
Actual speed
Verify_Transfer: Transfer Type | Size (MB) |  Offset | Time (s) | MB/s     
Verify_Transfer: Upload        |         4 | 02c0a3d | 0.004000 | 1000.00 
Verify_Transfer: Copy          |         4 | 01b1eb8 | 0.004000 | 1000.00 
Verify_Transfer: Download      |         4 | 01b4ccc | 0.031000 | 129.03 
Verify_Transfer: Upload        |         4 | 0273fb0 | 0.004000 | 1000.00 
Verify_Transfer: Copy          |         4 | 01154a4 | 0.004000 | 1000.00 
Verify_Transfer: Download      |         4 | 03502e7 | 0.035000 | 114.29 
Verify_Transfer: Upload        |         4 | 02d2a6c | 0.005000 | 800.00 
Verify_Transfer: Copy          |         4 | 026fc32 | 0.004000 | 1000.00 
Verify_Transfer: Download      |         4 | 011ba08 | 0.036000 | 111.11 
Verify_Transfer: Upload        |         4 | 03e5001 | 0.005000 | 800.00 
Verify_Transfer: Copy          |         4 | 03e3d64 | 0.004000 | 1000.00 
Verify_Transfer: Download      |         4 | 01eaf03 | 0.038000 | 105.26 
Verify_Transfer: Upload        |         5 | 00d6041 | 0.005000 | 1000.00 
Verify_Transfer: Copy          |         5 | 03b22d0 | 0.005000 | 1000.00 
Verify_Transfer: Download      |         5 | 015fafe | 0.044000 | 113.64 
Verify_Transfer: Upload        |         5 | 03f7897 | 0.006000 | 833.33 
Verify_Transfer: Copy          |         5 | 01b2a66 | 0.005000 | 1000.00 
Verify_Transfer: Download      |         5 | 00f7ca8 | 0.048000 | 104.17 
Verify_Transfer: Upload        |         6 | 03f7f8c | 0.006000 | 1000.00 
Verify_Transfer: Copy          |         6 | 04ba9d1 | 0.005000 | 1200.00 
Verify_Transfer: Download      |         6 | 0368f08 | 0.050000 | 120.00 
Verify_Transfer: Upload        |         7 | 022fe34 | 0.007000 | 1000.00 
Verify_Transfer: Copy          |         7 | 01dd5c6 | 0.006000 | 1166.67 
Verify_Transfer: Download      |         7 | 03f01f2 | 0.056000 | 125.00 
Verify_Transfer: Upload        |         8 | 03acccc | 0.008000 | 1000.00 
Verify_Transfer: Copy          |         8 | 018a3d7 | 0.007000 | 1142.86 
Verify_Transfer: Download      |         8 | 01a6e97 | 0.062000 | 129.03 
Verify_Transfer: Upload        |         9 | 015f15f | 0.008000 | 1125.00 
Verify_Transfer: Copy          |         9 | 069d3c7 | 0.008000 | 1125.00 
Verify_Transfer: Download      |         9 | 02f6585 | 0.072000 | 125.00 
Verify_Transfer: Upload        |        10 | 06f5c29 | 0.010000 | 1000.00 
Verify_Transfer: Copy          |        10 | 08468ad | 0.009000 | 1111.11 
Verify_Transfer: Download      |        10 | 019e26a | 0.086000 | 116.28 
Verify_Transfer: Upload        |        12 | 026fe71 | 0.011000 | 1090.91 
Verify_Transfer: Copy          |        12 | 0599ed7 | 0.011000 | 1090.91 
Verify_Transfer: Download      |        12 | 04a6223 | 0.109000 | 110.09 
Verify_Transfer: Upload        |        16 | 0510625 | 0.014000 | 1142.86 
Verify_Transfer: Copy          |        16 | 028b439 | 0.013000 | 1230.77 
Verify_Transfer: Download      |        16 | 048f5c2 | 0.129000 | 124.03 
Verify_Transfer: Upload        |        21 | 09ba5e3 | 0.019000 | 1105.26 
Verify_Transfer: Copy          |        21 | 00fb38a | 0.017000 | 1235.29 
Verify_Transfer: Download      |        21 | 08607bc | 0.172000 | 122.09 
Verify_Transfer: Upload        |        32 | 0753f7d | 0.029000 | 1103.45 
Verify_Transfer: Copy          |        32 | 097ced9 | 0.026000 | 1230.77 
Verify_Transfer: Download      |        32 | 05851eb | 0.257000 | 124.51 
Verify_Transfer: Upload        |        64 | 0000000 | 0.057000 | 1122.81 
Verify_Transfer: Copy          |        64 | 0000000 | 0.049000 | 1306.12 
Verify_Transfer: Download      |        64 | 0000000 | 0.314000 | 203.82 
Verify_Transfer: Average upload speed 1007.73 MB/s
Verify_Transfer: Average copy speed 1114.97 MB/s
Verify_Transfer: Average download speed 123.58 MB/s

Verify_Transfer: Verifying data integrity
Verify_Transfer: Data integrity OK
transfer --> passed
transfer: Test took 9.31 seconds to run:
transfer: Init 4.12s (44.33%)  Compute 2.59s (27.87%)Verify 2.59s (27.80)%
24******************************************************************************
Work-group Memory Copy Kernel:
	Performs an online compilation of a kernel which copies input
	to output using work-groups, verifying the results.

********************************************************************************
Verify_MemcpyWorkgroup: Instances:       6528, Copied   0.024902 MBs, Time   0.000325s  76.622596MB/s
Verify_MemcpyWorkgroup: Instances:     691200, Copied   2.636719 MBs, Time   0.004165s 633.065726MB/s
Verify_MemcpyWorkgroup: Instances:      31360, Copied   0.119629 MBs, Time   0.000377s 317.318054MB/s
Verify_MemcpyWorkgroup: Instances:    1344000, Copied   5.126953 MBs, Time   0.007608s 673.889738MB/s
memcpy_workgroups --> passed
memcpy_workgroups: Test took 0.97 seconds to run:
memcpy_workgroups: Init 0.14s (14.70%)  Verify 0.82s (85.30)%
25******************************************************************************
Global-offset Memory Copy Kernel:
	Performs an online compilation of a kernel which copies input
	to output using global offsets, verifying the results.

********************************************************************************
memcpy_global_offsets --> passed
memcpy_global_offsets: Test took 0.39 seconds to run:
memcpy_global_offsets: Init 0.15s (38.68%)  Verify 0.24s (61.32)%
26******************************************************************************
Image Convolution Test:
	Runs a number of image convolution kernels on an image

********************************************************************************
Failed to open source.bmp for reading
Running image copy, frame 0
Running Sharpen kernel, frame 1
Running Blur kernel, frame 2
Running Gaussian kernel, frame 3
Running Laplace Edge detection kernel, frame 4
convolution --> passed
convolution: Test took 2.70 seconds to run:
convolution: Init 1.96s (72.52%)  Compute 0.29s (10.81%)Verify 0.45s (16.67)%
27******************************************************************************
Conversions Test:
	Tests conversions between various data types and
	verifies the results.

********************************************************************************
conversions --> passed
conversions: Test took 20.44 seconds to run:
conversions: Init 0.00s (0.04%)  Verify 20.43s (99.95)%
28******************************************************************************
Events Test:
	Runs a mixture of kernels using events and not using events
	all of which perform a mem copy and verify the result

********************************************************************************
Verify_MulEvtKernel: Instances:         32, Copied    0 MBs, Time   0.000298s   0.000000MB/s
Verify_MulEvtKernel: Instances:         32, Copied    0 MBs, Time   0.000160s   0.000000MB/s
Verify_MulEvtKernel: Instances:         32, Copied    0 MBs, Time   0.000150s   0.000000MB/s
Verify_MulEvtKernel: Instances:         32, Copied    0 MBs, Time   0.000151s   0.000000MB/s
Verify_MulEvtKernel: Instances:         32, Copied    0 MBs, Time   0.000166s   0.000000MB/s
Verify_MulEvtKernel: Instances:         32, Copied    0 MBs, Time   0.000145s   0.000000MB/s
Verify_MulEvtKernel: Instances:         32, Copied    0 MBs, Time   0.000130s   0.000000MB/s
Verify_MulEvtKernel: Instances:         32, Copied    0 MBs, Time   0.000140s   0.000000MB/s
Verify_MulEvtKernel: Instances:         32, Copied    0 MBs, Time   0.000143s   0.000000MB/s
Verify_MulEvtKernel: Instances:         32, Copied    0 MBs, Time   0.000153s   0.000000MB/s
Verify_MulEvtKernel: Instances:         32, Copied    0 MBs, Time   0.000144s   0.000000MB/s
Verify_MulEvtKernel: Instances:         32, Copied    0 MBs, Time   0.000175s   0.000000MB/s
Verify_MulEvtKernel: Instances:         32, Copied    0 MBs, Time   0.000156s   0.000000MB/s
Verify_MulEvtKernel: Instances:         32, Copied    0 MBs, Time   0.000148s   0.000000MB/s
events --> passed
events: Test took 0.78 seconds to run:
events: Init 0.01s (2.42%)  Verify 0.76s (97.57)%
29******************************************************************************
Atomics Test:
	Runs all atomic functions

********************************************************************************
atomics --> passed
atomics: Test took 1.57 seconds to run:
atomics: Init 0.00s (0.56%)  Verify 1.57s (99.43)%
30******************************************************************************
NOP Test:
	Runs a kernel with no body effectively making it a NOP kernel

********************************************************************************
Compute_NOP: Online compilation test with 4096 instances running source:
<source>
__kernel void NOPKernel()
{
}
</source>
nop --> passed
nop: Test took 0.39 seconds to run:
nop: Init 0.10s (25.76%)  Compute 0.04s (10.45%)Verify 0.25s (63.77)%
Finished 31 tests in 291.6 seconds: 31 passed, 0 failed (100.00%)