Jetson/Computer Vision Performance
Hardware Acceleration of OpenCV
OpenCV is the de-facto standard Computer Vision library containg more than 2500 computer vision & image processing & machine learning algorithms. See Installing OpenCV on Jetson TK1 if you haven't done so yet. OpenCV has been significantly optimized by NVIDIA in 2 ways:
- OpenCV4Tegra: A free library provided by NVIDIA containing optimizations for NVIDIA's Tegra CPUs (ARM NEON SIMD optimizations, multi-core CPU optimizations and some GLSL GPU optimizations). OpenCV4Tegra is a binary replacement for the public OpenCV, thus the programmer just writes regular OpenCV code, that will automatically take advantage of OpenCV4Tegra optimizations without the developer or user necessarily knowing about it. It is supported on Android since Tegra 2 and also supported on Linux4Tegra, Vibrante, etc. It typically provides between 2x - 5x speedup on Tegra K1 compared to regular OpenCV.
- OpenCV 'gpu' module: The 'gpu' module in the public OpenCV library is designed purely for CUDA GPGPU acceleration with NVIDIA's mobile & desktop GPUs. The developer must make minor changes to their code to specifically call functions from the OpenCV 'gpu' module in order for their OpenCV code to take advantage of the GPU. This allows the developer to control memory allocations for the GPU, choose when it is transferred between CPU & GPU, and choose which functions should run on GPU vs CPU and control the streaming or multi-GPU behaviour, etc. It has been an important part of OpenCV on desktop since 2010/2011, and is supported by most NVIDIA GPUs available today. Some functions (such as Haar Cascade Classifiers) are not as suited to GPUs so only get minor speedups or don't exist, while other functions (such as LBP Cascade Classifiers, HOG, stereo vision, warping, etc) are much more suited to GPUs and thus can get 5x - 20x speedups on Tegra K1 compared to regular OpenCV.
Presentation videos about the OpenCV4Tegra module
A free online webinar (on NVIDIA's GTC Express page) introduces the OpenCV4Tegra module, from the actual OpenCV4Tegra development team:
- Introduction to OpenCV for Tegra (March 2013) describes OpenCV4Tegra including the installation steps for Android. Video and Slides.
Presentation videos about the OpenCV 'gpu' module
Two free online webinars (on NVIDIA's GTC Express page) introduce OpenCV's GPU module, from the actual OpenCV development team:
- OpenCV - Accelerated Computer Vision using GPUs (June 2013) gives a non-technical overview of OpenCV and the GPU module, showing what is available and why you would want to use it. Video and Slides.
- Getting Started with GPU-accelerated Computer Vision using OpenCV and CUDA (July 2013) is more technical, it shows how you can install OpenCV's GPU module, shows the memory model of the GPU module, and how to combine OpenCV's GPU module with your own custom CUDA kernels. Video and Slides.
Power draw during computer vision tasks
The page Typical power draw of Jetson TK1 shows that the total power draw for Jetson TK1 is around 1.6W when idle (including the 0.4W fan) and is typically under 4W even when in moderate use. However, computer vision tasks are often able to push hardware to their limits, so this following section gives detailed power measurements for various computer vision programs.
This table covers various OpenCV sample CPU programs (in the "opencv-2.4.9/samples/cpp" folder), OpenCV sample GPU programs (in the "opencv-2.4.9/samples/gpu" folder), some VisionWorks sample CPU+GPU programs (freely available from NVIDIA), and some CUDA sample computer vision programs (in the "NVIDIA_CUDA6.0_Samples/3_Imaging" folder).
These are total power measurements for the whole Jetson TK1 board when running from 12V with the default fan attached (using 0.4W) without any customizations or power savings applied. The OpenCV samples were executed remotely through ethernet (where the Jetson TK1 was using 1.6W in between each of the OpenCV tests), while the VisionWorks and CUDA samples required a GPU-accelerated display and thus were executed through the Ubuntu Unity graphical desktop with a HDMI monitor and USB hub + keyboard & mouse attached (hence the Jetson TK1 was using 3.4W in between each of the VisionWorks & CUDA tests).
|Sample code||Library||Processor||Approximate power (Watts) for whole Jetson TK1 board||Performance|
|bgfg_segm||OpenCV||CPU||2.8||~7 FPS (MOG2 algorithm)|
|bgfg_segm||OpenCV||GPU||2.4||~34 FPS (MOG2 algorithm)|
|stereo_match||OpenCV||CPU||2.4||~3 FPS (BM algorithm)|
|stereo_match||OpenCV||GPU||3.4||~24 FPS (BM algorithm)|
|feature_tracker||VisionWorks||GPU||6||~40 FPS @ 720p (performing Color Conversion, Gaussian Pyramid, Pyramidal Optical Flow, plus Feature Tracking)|
|hough_lines||VisionWorks||GPU||7||~30 FPS @ 720p (performing Color Conversion, Canny, plus Probabilistic Hough Lines)|
|motion_estimation||VisionWorks||GPU||6||~20 FPS @ 720p (performing Color Conversion, plus IME Motion Estimation)|
|object_detector||VisionWorks||GPU||7||~5 FPS @ 720p (performing Color Conversion, plus HOG Pedestrian Detection)|
|object_tracker||VisionWorks||GPU||6||~60 FPS @ 720p (performing Color Conversion, Gaussian Pyramid, Forward Optical Flow, Backward Optical Flow, plus Median Flow)|
|pedestrian detector||VisionWorks||GPU||10||~2 FPS @ 720p (performing Soft Cascade Classifier)|
|car detector||VisionWorks||GPU||10||~5 FPS @ 720p (performing Soft Cascade Classifier)|
|SLAM||VisionWorks||GPU||7||~25 FPS @ 480p (performing SLAM)|
|bilateralFilter||CUDA||GPU||11.4||~34 FPS @ 640x480|
|boxFilter||CUDA||GPU||7.0||~23 FPS @ 1024x1024|
|imageDenoising||CUDA||GPU||6.0||~150 FPS @ 320x408|
|SobelFilter||CUDA||GPU||4.6||~150 FPS @ 512x512|