Difference between revisions of "Jetson/Graphics Performance"

From eLinux.org
Jump to: navigation, search
m
(GPGPU Capabilities: Reprasing, and updated the PowerVR predictions)
 
(17 intermediate revisions by 2 users not shown)
Line 3: Line 3:
  
 
== GPGPU Capabilities ==
 
== GPGPU Capabilities ==
The Tegra K1 SOC GPU provides excellent GPGPU performance per Watt. Nvidia claims Tegra TK1 can attain 326 GFLOPS, whereas its closest contemporary competitor, the SnapDragon 805, may achieve [http://kyokojap.myweb.hinet.net/gpu_gflops/|an estimated 200 GFLOPS].
+
NVIDA's Tegra K1 [http://www.anandtech.com/show/7622/nvidia-tegra-k1/3 has leaped over all mobile GPU competition] in terms of software compatibility between desktop & mobile GPUs, total performance, and perf-per-Watt, by spending many years to design a significantly low-power version of the same [http://en.wikipedia.org/wiki/Kepler_%28microarchitecture%29 Kepler GPU architecture] that has now been used for years to power the world's fastest desktop GPUs and super-computers. Now that Tegra K1 is finally available, it allows existing well-tested desktop tools, libraries & frameworks to work on the Tegra K1 with minimal modification. Supported APIs in Tegra K1 include OpenGL 4.4, OpenGL ES 3.0, DirectX 11.2 and CUDA 6.0.
Imagination Technologies [http://www.macrumors.com/2014/02/24/rogue-series6xt-ios-graphics/|have announced a PowerVR GX6650] design which they claim can challenge the Tegra K1 performance. However, the design favors <abbr title="Floating Point 16-bit">FP16</abbr> operations which may limit its usefulness for GPGPU tasks. As of June 20 2014, public bench marks are not available, and the GX6650 may not ship to consumer until 2015. By that time, the NVidia Erista (the Maxwell successor to the Tegra K1) should be available.
 
  
NVidia [http://www.anandtech.com/show/7622/nvidia-tegra-k1/3|has leapt over the competition] by using the same Kepler GPU architecture that it has used for years to power the worlds fastest desktop GPUs and super computers. This decision allows them to offer existing, well-tested tools on the TK1 with minimal modification. Supported APIs include OpenGL ES 3.0 and OpenGL 4.4, DirectX 11, CUDA 6, and OpenCL 1.2.
+
The Tegra K1 SOC provides excellent GPGPU performance per Watt, most likely [https://en.wikipedia.org/wiki/Performance_per_watt#Examples better than any CPU or GPU to date whether for mobile, desktop or super-computer]. Tegra K1 can achieve more than 326 GFLOPS, whereas the closest upcoming competitor SnapDragon 805 was expected to achieve 200 GFLOPS but [http://www.androidorigin.com/lg-g3-snapdragon-801-vs-g3-lte-cat-6-805-performance-benchmark-comparison/ recent benchmarks show] that it isn't as good as was expected.
  
== Frames Per Seconds (FPS) Comparisons ==
+
Straight after NVIDIA anounced Tegra K1, Imagination Technologies [http://www.macrumors.com/2014/02/24/rogue-series6xt-ios-graphics/ announced the design of PowerVR GX6650 for late-2015], which they claim might challenge the Tegra K1 performance. However, their design favors <abbr title="Floating Point 16-bit">FP16</abbr> operations which may limit its usefulness for GPGPU tasks that typically need 32-bit and sometimes 64-bit float precision. Also, by the time Imagination's new GPU architecture arrives in consumer hands, NVIDIA's Maxwell based mobile GPU should also be available but with better features & performances than even Tegra K1.
The Jetson TK1 provides a relatively unique combination of hardware and software. The CPU is ARM architecture, the GPU is NVidia Kepler, and the OS is Ubuntu Linux 14.04.  While this a fine software stack, it is also unique.  As a result, has proved more difficult to compare graphic performance with a traditional x86 stack.
 
  
Testing shows graphics performance on the Jeston TK1 to be roughly comparable to Intel HD 4600 graphics, but with superior OpenGL and GPGPU capabilities.
+
== Frames Per Seconds (FPS) ==
 +
The graphics performance on the [[Jetson TK1]] has been roughly comparable to Intel HD 4600 graphics, but with superior OpenGL and GPGPU support.  We hope to add more benchmarks and comparisons to competing system below.  However, the unique combination of hardware and software on the board presents a challenge.
 +
 
 +
The Jetson TK1 CPU uses ARM A15r2 cores, the GPU is one NVidia Kepler SMX modified for mobile, and the OS is Ubuntu Linux 14.04.  While this a great combination of technologies, it is also unique.  As a result, has proved more difficult to compare graphic performance with a traditional PC configurations. Common PC graphics benchmarks such as 3DMark and GFXBench are not available for ARM/Linux even though they are available for ARM/Android.<ref>NVidia has commercial licenses for graphic benchmarks, and therefore have been able to publish results for the Jetson TK1.</ref> Compiling applications from source can also be a challenge, as many graphics games and utilities for Linux assume x86 architecture. Thus extensions like SSE cannot be used, and may not be easily replaced with a similar ARM extension like NEON.
 +
 
 +
The Xonotic tested below is a custom compile direct from source.  The results are from "the big benchmark" which is provided with the source.  This is apparently the same method used by Phoronix.  Therefore, comparisons between results produced by the Phoronix Test Suite at the same resolution should be valuable.  The author did find it interesting that lowering the resolution below 1080p had little effect on the frame rates.  This implies that  [http://en.wikipedia.org/wiki/Fillrate fillrate] is not a limiting factor at 1080p and below.
 +
 
 +
=== Xontonic 0.7.0 @ 1920x1080 ===
 +
{| class="wikitable"
 +
! Effects Level
 +
! Low
 +
! Average
 +
! High
 +
|-
 +
! Low
 +
| 43
 +
! 83
 +
| 140
 +
|-
 +
! Medium
 +
| 35
 +
! 75
 +
| 131
 +
|-
 +
! Normal
 +
| 34 
 +
! 71
 +
| 120
 +
|-
 +
! High
 +
| 17
 +
! 42
 +
| 60
 +
|-
 +
! Ultra
 +
| 6
 +
! 29
 +
| 47
 +
|-
 +
! Ultimate
 +
| 4
 +
! 19
 +
| 32
 +
|}
  
 
== Power Use - Overview ==
 
== Power Use - Overview ==
Graphics intensive applications, including demanding OpenGL games, have shown surprisingly low power requirements.  This may be due the OpenGL interface defaulting to lower-power FP16 operations. GPGPU applications that harnessed the power of all CUDA cores, however, required as much as 8.63W peak for both SOC and RAM.
+
Graphics intensive applications, including demanding OpenGL games, have shown surprisingly low power requirements - generally below 5W average for the SOC and RAM.  This may be due the OpenGL interface defaulting to lower-power FP16 operations. GPGPU applications that harnessed the power of all CUDA cores, however, have required as much as 8.63W peak for both SOC and RAM, and 11.06W for the board.
  
 
=== Test System ===
 
=== Test System ===
Line 39: Line 80:
 
* Nvidia's numbers found in their [http://developer.download.nvidia.com/embedded/jetson/TK1/docs/Jetson_platform_brief_May2014.pdf|technical brief] (page 13) appear accurate to conservative.
 
* Nvidia's numbers found in their [http://developer.download.nvidia.com/embedded/jetson/TK1/docs/Jetson_platform_brief_May2014.pdf|technical brief] (page 13) appear accurate to conservative.
 
* Nvidia's point about drawing comparisons to mobile appear valid. The board drives a number of ports that either have low-power alternatives or aren't normally available mobile devices.  Examples include GbE, desktop RAM, the SATA port, and mini-PCI.
 
* Nvidia's point about drawing comparisons to mobile appear valid. The board drives a number of ports that either have low-power alternatives or aren't normally available mobile devices.  Examples include GbE, desktop RAM, the SATA port, and mini-PCI.
* The system as configured draws about 2.0W with the processor running at ~0.6W at idle.
 
  
 
=== Base Measurements ===
 
=== Base Measurements ===
Line 67: Line 107:
 
== Power Use - Graphics ==
 
== Power Use - Graphics ==
  
== Power Use - GPGPU ==
 
  
TEST1 : glmark2 -s 1920x1080 --off-screen
+
=== glmark2 -s 1920x1080 --off-screen ===
  Score : 282 (Intel Celeron J1900@2.9GHz = 151)
+
 
  Power Measurements:
+
{| class="wikitable"
    Base            : 0.22A ( 2.67W)
+
|-
    Peak            : 0.62A ( 7.53W)
+
! Component
    Observed Avg.   : 0.35A ( 4.25W)
+
! Volts
 +
! Amps
 +
! Watts
 +
|-
 +
| Minimum
 +
| 12.15
 +
| 0.22
 +
| 2.67
 +
|-
 +
| Maximum
 +
| 12.15
 +
| 0.62
 +
| 7.53
 +
|-
 +
| Average
 +
| 12.15
 +
| 0.35
 +
| 4.25
 +
|-
 +
| Average less System
 +
| 12.15
 +
| 0.18
 +
| 2.19
 +
|}
 +
 
 +
=== VLC streaming 720p video from NAS GbE ===
 +
{| class="wikitable"
 +
|-
 +
! Component
 +
! Volts
 +
! Amps
 +
! Watts
 +
|-
 +
| Minimum
 +
| 12.15
 +
| 0.29
 +
| 3.52
 +
|-
 +
| Maximum
 +
| 12.15
 +
| 0.41
 +
| 4.98
 +
|-
 +
| Average
 +
| 12.15
 +
| 0.34
 +
| 4.13
 +
|-
 +
| Average less System
 +
| 12.15
 +
| 0.71
 +
| 2.01
 +
|}
 +
 
 +
=== Xonotic v0.7.0 normal @ 1920x1080 ===
 +
{| class="wikitable"
 +
|-
 +
! Component
 +
! Volts
 +
! Amps
 +
! Watts
 +
|-
 +
| Average
 +
| 12.15
 +
| 0.56
 +
| 6.56
 +
|-
 +
| Average less System
 +
| 12.15
 +
| 0.39
 +
| 4.74
 +
|}
  
    Avg. Less Fan    : 0.28A ( 3.40W)
+
=== Web Browsing, Chromium ===
    Avg. Less Sys    : 0.18A ( 2.19W)
+
{| class="wikitable"
 +
|-
 +
! Component
 +
! Volts
 +
! Amps
 +
! Watts
 +
|-
 +
| Average
 +
| 12.15
 +
| 0.35
 +
| 4.25
 +
|-
 +
| Average less System
 +
| 12.15
 +
| 0.28
 +
| 2.19
 +
|}
  
TEST2 : CUDA Smoke particle demo
+
== Power Use - GPGPU ==
  Power Measurements:
 
    Base            : 0.62A ( 7.53W)
 
    Peak            : 0.91A (11.06W)
 
    Observed Avg.    : 0.88A (10.69W)
 
  
    Avg. Less Fan    : 0.81A ( 9.85W)   
+
=== CUDA Smoke particle demo ===
    Avg. Less Sys    : 0.71A ( 8.26W)
 
  
TEST3 : VLC streaming 720p video from NAS GbE
+
{| class="wikitable"
  Power Measurements:
+
|-
    Base            : 0.29A ( 3.52W)
+
! Component
    Peak            : 0.41A ( 4.98W)
+
! Volts
    Observed Avg.   : 0.34A ( 4.13W)
+
! Amps
 +
! Watts
 +
|-
 +
| Minimum
 +
| 12.15
 +
| 0.62
 +
| 7.53
 +
|-
 +
| Maximum
 +
| 12.15
 +
| 0.91
 +
| 11.06
 +
|-
 +
| Average
 +
| 12.15
 +
| 0.88
 +
| 10.69
 +
|-
 +
| Average less System
 +
| 12.15
 +
| 0.71
 +
| 8.26
 +
|}
  
    Avg. Less Fan    : 0.27A ( 3.28W)   
+
<references/>
    Avg. Less Sys    : 0.17A ( 2.01W)
 

Latest revision as of 21:41, 3 August 2014

The Tegra TK1 SOC

The Tegra K1 SOC in the Jetson TK1 is targeted for embedded GPGPU applications as well as general purpose use in power-constrained devices such as super-phones, tablets, laptops, set-top boxes, and low-power desktop computers.

GPGPU Capabilities

NVIDA's Tegra K1 has leaped over all mobile GPU competition in terms of software compatibility between desktop & mobile GPUs, total performance, and perf-per-Watt, by spending many years to design a significantly low-power version of the same Kepler GPU architecture that has now been used for years to power the world's fastest desktop GPUs and super-computers. Now that Tegra K1 is finally available, it allows existing well-tested desktop tools, libraries & frameworks to work on the Tegra K1 with minimal modification. Supported APIs in Tegra K1 include OpenGL 4.4, OpenGL ES 3.0, DirectX 11.2 and CUDA 6.0.

The Tegra K1 SOC provides excellent GPGPU performance per Watt, most likely better than any CPU or GPU to date whether for mobile, desktop or super-computer. Tegra K1 can achieve more than 326 GFLOPS, whereas the closest upcoming competitor SnapDragon 805 was expected to achieve 200 GFLOPS but recent benchmarks show that it isn't as good as was expected.

Straight after NVIDIA anounced Tegra K1, Imagination Technologies announced the design of PowerVR GX6650 for late-2015, which they claim might challenge the Tegra K1 performance. However, their design favors FP16 operations which may limit its usefulness for GPGPU tasks that typically need 32-bit and sometimes 64-bit float precision. Also, by the time Imagination's new GPU architecture arrives in consumer hands, NVIDIA's Maxwell based mobile GPU should also be available but with better features & performances than even Tegra K1.

Frames Per Seconds (FPS)

The graphics performance on the Jetson TK1 has been roughly comparable to Intel HD 4600 graphics, but with superior OpenGL and GPGPU support. We hope to add more benchmarks and comparisons to competing system below. However, the unique combination of hardware and software on the board presents a challenge.

The Jetson TK1 CPU uses ARM A15r2 cores, the GPU is one NVidia Kepler SMX modified for mobile, and the OS is Ubuntu Linux 14.04. While this a great combination of technologies, it is also unique. As a result, has proved more difficult to compare graphic performance with a traditional PC configurations. Common PC graphics benchmarks such as 3DMark and GFXBench are not available for ARM/Linux even though they are available for ARM/Android.[1] Compiling applications from source can also be a challenge, as many graphics games and utilities for Linux assume x86 architecture. Thus extensions like SSE cannot be used, and may not be easily replaced with a similar ARM extension like NEON.

The Xonotic tested below is a custom compile direct from source. The results are from "the big benchmark" which is provided with the source. This is apparently the same method used by Phoronix. Therefore, comparisons between results produced by the Phoronix Test Suite at the same resolution should be valuable. The author did find it interesting that lowering the resolution below 1080p had little effect on the frame rates. This implies that fillrate is not a limiting factor at 1080p and below.

Xontonic 0.7.0 @ 1920x1080

Effects Level Low Average High
Low 43 83 140
Medium 35 75 131
Normal 34 71 120
High 17 42 60
Ultra 6 29 47
Ultimate 4 19 32

Power Use - Overview

Graphics intensive applications, including demanding OpenGL games, have shown surprisingly low power requirements - generally below 5W average for the SOC and RAM. This may be due the OpenGL interface defaulting to lower-power FP16 operations. GPGPU applications that harnessed the power of all CUDA cores, however, have required as much as 8.63W peak for both SOC and RAM, and 11.06W for the board.

Test System

  • Standard Jetson TK1 developer board
  • Audio out active
  • Attached GbE
  • One NFS mount to external NAS active
  • Four port USB3 hub attached
  • Logitech K310 USB Keyboard attached via USB hub
  • Logitech Marble Mouse attached via USB hub
  • Logitech C615 HD video cam attached via USB hub
  • HDMI out @1920x1080
  • Standard Cooling Fan
  • Installed 64GB SD card with one ext4 mount active
  • Kubuntu standard desktop, compositing disabled

Test Methodology

The Jetson TK1 was tested in a response to a forum discussion. I tested using a Multimeter patched into the DC line between the A/C power converter and the board.

Observations

  • The power adapter was measured to provide consistent 12.15 volts.
  • The fan's power draw (0.85W) was determined by unplugging it for a short time while the board was idle and noting the difference in power draw.
  • The Jetson TK1 board as configured has yet to exceed 12.0W total draw under any workload tested.
  • Nvidia's numbers found in their brief (page 13) appear accurate to conservative.
  • Nvidia's point about drawing comparisons to mobile appear valid. The board drives a number of ports that either have low-power alternatives or aren't normally available mobile devices. Examples include GbE, desktop RAM, the SATA port, and mini-PCI.

Base Measurements

Component Volts Amps Watts
Idle KDE Desktop 12.15 0.22 2.67
Less Fan 12.15 0.15 1.82
Less System 12.15 0.05 0.61

Power Use - Graphics

glmark2 -s 1920x1080 --off-screen

Component Volts Amps Watts
Minimum 12.15 0.22 2.67
Maximum 12.15 0.62 7.53
Average 12.15 0.35 4.25
Average less System 12.15 0.18 2.19

VLC streaming 720p video from NAS GbE

Component Volts Amps Watts
Minimum 12.15 0.29 3.52
Maximum 12.15 0.41 4.98
Average 12.15 0.34 4.13
Average less System 12.15 0.71 2.01

Xonotic v0.7.0 normal @ 1920x1080

Component Volts Amps Watts
Average 12.15 0.56 6.56
Average less System 12.15 0.39 4.74

Web Browsing, Chromium

Component Volts Amps Watts
Average 12.15 0.35 4.25
Average less System 12.15 0.28 2.19

Power Use - GPGPU

CUDA Smoke particle demo

Component Volts Amps Watts
Minimum 12.15 0.62 7.53
Maximum 12.15 0.91 11.06
Average 12.15 0.88 10.69
Average less System 12.15 0.71 8.26
  1. NVidia has commercial licenses for graphic benchmarks, and therefore have been able to publish results for the Jetson TK1.