Jetson/L4T/TRT Customized Example

This page collects information to deploy customized models with TensorRT and some common questions for Jetson.

OpenCV with ONNX model
Below is an example to deploy TensorRT from an ONNX model with OpenCV images.

Verified environment:
 * JetPack5.1 + Orin

OpenCV with PLAN model
Below is an example to deploy TensorRT from a TensorRT PLAN model with OpenCV images.

Verified environment:
 * JetPack5.1 + Orin

$ /usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/resnet50/ResNet50.onnx --saveEngine=sample.engine

Multi-threading
Below is an example to run TensorRT with threads.

Verified environment:
 * JetPack4.5.1 + Xavier

$ /usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/mnist/mnist.onnx --saveEngine=mnist.trt $ cd /usr/src/tensorrt/data/mnist/ $ sudo pip3 install pillow $ python3 download_pgms.py $ wget https://raw.githubusercontent.com/AastaNV/JEP/master/elinux/my_tensorrt_code.py -O my_tensorrt_code.py

YoloV4 Tiny
Verified environment:
 * JetPack4.5.1 + Xavier

Deepstream can reach 60fps with 4 video stream on Xavier: $ cd /opt/nvidia/deepstream/deepstream-5.1/sources/objectDetector_Yolo $ wget https://raw.githubusercontent.com/AastaNV/eLinux_data/main/deepstream/yolov4-tiny/yolov4_tiny.patch $ git apply yolov4_tiny.patch $ export CUDA_VER=10.2 $ make -C nvdsinfer_custom_impl_Yolo

$ wget https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4-tiny.cfg -q --show-progress $ wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.weights -q --show-progress $ wget https://raw.githubusercontent.com/AastaNV/eLinux_data/main/deepstream/yolov4-tiny/deepstream_app_config_yoloV4_tiny.txt $ wget https://raw.githubusercontent.com/AastaNV/eLinux_data/main/deepstream/yolov4-tiny/config_infer_primary_yoloV4_tiny.txt

$ deepstream-app -c deepstream_app_config_yoloV4_tiny.txt

Custom Parser for SSD-MobileNet Trained by Jetson-inference
Verified environment:
 * JetPack4.5.1 + Xavier

$ cd /opt/nvidia/deepstream/deepstream-5.1/sources/objectDetector_SSD/ $ sudo wget https://raw.githubusercontent.com/AastaNV/eLinux_data/main/deepstream/ssd-jetson_inference/ssd-jetson_inference.patch $ sudo git apply ssd-jetson_inference.patch $ sudo CUDA_VER=10.2 make -C nvdsinfer_custom_impl_ssd/

Update config_infer_primary_ssd.txt:

Ex.

$ deepstream-app -c deepstream_app_config_ssd.txt

VPI with Jetson-utils
Below is an example to use VPI with jetson-utils

Verified environment:
 * JetPack4.6 + XavierNX

VPI with Deepstream
Please find the following link for the example:

https://forums.developer.nvidia.com/t/deepstream-sdk-vpi-on-jetson-tx2/166834/20

VPI with Argus Camera
Please find the following link for the example:

https://forums.developer.nvidia.com/t/how-do-i-get-image-from-cudabayerdemosaic-and-connect-to-vpi/213529/18

VPI with nvivafilter
Please find the following link for the example:

https://forums.developer.nvidia.com/t/using-vpi-in-gstreamer/223334/21

Stress Test for Orin
We describe the testing tools that can stress the Jetson AGX Orin to the full workload.

The expected power consumption with these steps is listed in the below table:

Maximize the Device Performance
$ sudo nvpmodel -m 0 $ sudo jetson_clocks

CPU Stress Test
Using Linux stress tool: $ sudo apt-get install stress $ stress --cpu $(nproc)

GPU Stress Test
Running cuBLAS sample with the half data type:

 1. Find matrixMulCUBLAS sample under CUDA sample folder

 2. Apply the following change the data type from float to half

 3. Run the sample $ ./matrixMulCUBLAS

Darknet with cuDNN-8 Support
Below are the steps to build darknet with cuDNN-8 support.

Verified environment:
 * JetPack4.5.1 + Xavier

1. Get source $ git clone https://github.com/pjreddie/darknet.git $ cd darknet/ $ wget https://raw.githubusercontent.com/AastaNV/JEP/master/script/topics/0001-fix-for-cudnn_v8-limited-memory-to-default-darknet-s.patch $ wget https://raw.githubusercontent.com/AastaNV/JEP/master/elinux/opencv-darknet.patch -O opencv-darknet.patch $ git am 0001-fix-for-cudnn_v8-limited-memory-to-default-darknet-s.patch $ git am opencv-darknet.patch

2. Update Makefile based on your device


 * Xavier & XavierNX:


 * TX2:


 * Nano:

3. Build and Test $ make -j8 $ wget https://pjreddie.com/media/files/yolov3-tiny.weights $ ./darknet detector demo cfg/coco.data cfg/yolov3-tiny.cfg yolov3-tiny.weights [video]

TensorRT Python Bindings
Below are the steps to build TensorRT Python 3.9 bindings.

Verified environment:
 * JetPack4.6 + Xavier

1. Building python3.9 $ sudo apt install zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libreadline-dev libffi-dev libsqlite3-dev libbz2-dev $ wget https://www.python.org/ftp/python/3.9.1/Python-3.9.1.tar.xz $ tar xvf Python-3.9.1.tar.xz Python-3.9.1/

$ mkdir build-python-3.9.1 $ cd build-python-3.9.1/ $ ../Python-3.9.1/configure --enable-optimizations $ make -j $(nproc) $ sudo -H make altinstall $ cd ../

2. Build cmake 3.13.5 $ sudo apt-get install -y protobuf-compiler libprotobuf-dev openssl libssl-dev libcurl4-openssl-dev $ wget https://github.com/Kitware/CMake/releases/download/v3.13.5/cmake-3.13.5.tar.gz $ tar xvf cmake-3.13.5.tar.gz $ rm cmake-3.13.5.tar.gz

$ cd cmake-3.13.5/ $ ./bootstrap --system-curl $ make -j$(nproc)

$ echo 'export PATH='${PWD}'/bin/:$PATH' >> ~/.bashrc $ source ~/.bashrc $ cd ../

3. Prepare header $ mkdir python3.9 $ mkdir python3.9/include $ wget http://ftp.us.debian.org/debian/pool/main/p/python3.9/libpython3.9-dev_3.9.9-2_arm64.deb $ ar x libpython3.9-dev_3.9.9-2_arm64.deb $ tar -xvf data.tar.xz $ cp ./usr/include/aarch64-linux-gnu/python3.9/pyconfig.h python3.9/include/ $ cp -r Python-3.9.1/Include/* python3.9/include/

4. Build TensorRT pybinding $ git clone https://github.com/pybind/pybind11.git $ git clone -b release/8.0 https://github.com/NVIDIA/TensorRT.git $ cd TensorRT $ git submodule update --init --recursive

$ cd python/ $ TRT_OSSPATH=${PWD}/.. EXT_PATH=${PWD}/../.. TARGET=aarch64 PYTHON_MINOR_VERSION=9 ./build.sh $ python3.9 -m pip install build/dist/tensorrt-8.0.1.6-cp39-none-linux_aarch64.whl

Caffe
Below are the steps to build the Caffe library.

Verified environment:
 * JetPack4.6 + Xavier

$ wget https://raw.githubusercontent.com/AastaNV/JEP/master/elinux/install_caffe_jp46.sh -O install_caffe_jp46.sh $ wget https://raw.githubusercontent.com/AastaNV/JEP/master/elinux/0001-patch-for-jp4.6.patch -O 0001-patch-for-jp4.6.patch $ ./install_caffe_jp46.sh $ source ~/.bashrc

MXNet
Below are the steps to build the MXNet 1.8.0 library.

Verified environment:
 * JetPack4.5.1 + Xavier

$ wget https://raw.githubusercontent.com/AastaNV/JEP/master/elinux/mxnet_v1.8.x.patch -O mxnet_v1.8.x.patch $ wget https://raw.githubusercontent.com/AastaNV/JEP/master/elinux/autobuild_mxnet.sh -O autobuild_mxnet.sh $ sudo chmod +x autobuild_mxnet.sh $ ./autobuild_mxnet.sh Xavier $ cd mxnet/build/ $ pip3 install mxnet-1.8.0-py3-none-any.whl

PyInstaller with OpenCV
Currently, the OpenCV version between JetPack default and Pyinstaller is not consistent.

To solve this issue, you can either upgrade the python-opencv version or downgrade the PyInstaller version.

$ pip3 install opencv-python
 * Upgrade python-opencv

$ sudo pip3 install pyinstaller==4.2 $ sudo pip3 install pyinstaller-hooks-contrib==2021.2
 * Downgrade pyinstaller and pyinstaller-hooks-contrib

$ pyinstaller --onefile --paths="/usr/lib/python3.6/dist-packages/cv2/python-3.6" myfile.py

"Unsupported ONNX data type: UINT8 (2)"
This error is from TensorRT. The root cause is that ONNX expects the input image to be INT8 but TensorRT uses Float32.

To solve this issue, you can modify the input data format of ONNX with our graphsurgeon API.

$ sudo apt-get install python3-pip libprotobuf-dev protobuf-compiler $ git clone https://github.com/NVIDIA/TensorRT.git $ cd TensorRT/tools/onnx-graphsurgeon/ $ make install

"Illegal instruction (core dumped)"
This is a known issue in NumPy v1.19.5.

To solve this issue, you can either downgrade your NumPy into 1.19.4 or manually update an environment variable.

$ sudo apt-get install python3-pip $ pip3 install Cython $ pip3 install numpy==1.19.4
 * Downgrade NumPy

$ export OPENBLAS_CORETYPE=ARMV8
 * Update environment variable

Long delays when submitting several cudaMemcpy
Please try to increase the computing channel $ export CUDA_DEVICE_MAX_CONNECTIONS=32

A document can be found here:

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars