From eLinux.org
< TensorRT
Revision as of 19:04, 17 January 2022 by Zerollzeng (talk | contribs)
Jump to: navigation, search

This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model

TRT Compatibility

ONNX Operators: https://github.com/onnx/onnx-tensorrt/blob/master/docs/operators.md (e.g. TensorRT 7.2 supports operators up to Opset 11)
cuDNN/TF/Pytorch/ONNX: "Compatibility" section in TensorRT release note - https://docs.nvidia.com/deeplearning/tensorrt/release-notes/index.html
Protobuf: https://github.com/onnx/onnx-tensorrt#dependencies (e.g. Protobuf >= 3.0.x)

TRT Inference with explicit batch onnx model

Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.

1. Fixed shape model

If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.

If you got below warning log when you’re trying to do inference with onnx model.

[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.

as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126

and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network.

2. Dynamic shape model

If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.

See a sample here:(https://github.com/lynettez/SampleONNX)

How to convert your model to onnx?

1. Convert Pytorch model, you can use torch.onnx API, sample codes:


2. Convert Tensorflow model, using tf2onnx tool:


3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:


4. How to deal with conditional and loop statements like if/for/while when export to onnx

Don't use python's if/else/for/while directly in the forward pass is prefered. For tensorflow, refer to tf.cond or tf.while_loop... For pytorch, refer to tracing-vs-scripting. Using python statement directly may cause unexpected behavior when export to onnx, eg: unroll the loop which generates a big onnx graph finally, can not export the statement as expected or other unexpected behaviors.

Hint: Netron can not visualize the graph inside an if/loop operator, you can use onnx-graphsurgeon to print operators inside the subgraph.

How to modify the model to replace batch dimension with dynamic dim?

If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.

   auto input = network->getInput(0);
   input->setDimensions(Dims4{-1, 3, 224, 224});

Or using onnx API:

   import onnx
   model = onnx.load('alexnet_fixed.onnx')
   model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'
   onnx.save(model, 'dynamic_alexnet.onnx')

If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model, refer to the below introduction of Onnx-GraghSurgeon

INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size 

How to use trtexec to run inference with dynamic shape?

trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \
        --minShapes=data:1x3x224x224 \  # kMIN shape
        --optShapes=data:3x3x224x224 \  # kOPT shape
        --maxShapes=data:5x3x224x224 \  # kMAX shape
        --shapes=data:3x3x224x224 \     # Inference shape - this is like context->setBindingDimensions()

If you have onnx exported from TF with input “x:0”, you also could run with

trtexec … --shapes=\'x:0\':5x3x224x224 …

How to convert onnx model to a tensorrt engine?

Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp

You also could use trtexec to do the same thing with below cmd:

trtexec --explicitBatch --onnx=your_model.onnx

If you met some error during converting onnx to engine

If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:
1. Check ONNX model using checker function and see if it passes?

import onnx
model = onnx.load("model.onnx")

2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes. You also could try with newer onnx opset during the converting of onnx model.

Tips: If you’re converting tf model to onnx, you might have a try with:

onnx_graph = tf2onnx.optimizer.optimize_graph(onnx_graph)

This will help to avoid some converting error

Some performance tests about dynamic shape with onnx model

Test environment
TensorRT: 7.0
CUDA: 10.2

MobilenetV2 OptimizationProfile Engine size(bit)
Fixed shape [1, 3, 224 ,224] - 14487087
Dynamic shape [-1, 3, 224, 224] Not setting, default to [1, 3, 224 ,224] 14487537
--minShapes=data:1x3x224x224 --optShapes=data:5x3x224x224 --maxShapes=data:10x3x224x224 14595945
--minShapes=data:10x3x224x224 --optShapes=data:15x3x224x224 --maxShapes=data:20x3x224x224 14601941
--minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 14604501
MobilenetV2 Inference Batch Execution time(ms)
Fixed shape [1, 3, 224 ,224] 1 1.01
Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:8x3x224x224 --maxShapes=data:16x3x224x224 1 1.36
8 4.47
16 8.76
Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 1 1.44
8 4.56
16 8.23
32 16.21

As the test results showed,
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.
2. The performance will be the best when the inference shape is the same as the optShape you set.

Introduce some use cases of polygraphy

Polygraphy is a toolkit designed to assist in running and debugging deep learning models in various frameworks. It includes a Python API and a command-line interface (CLI) built using this API.

Installing Prebuilt Wheels python -m pip install colored polygraphy --extra-index-url https://pypi.ngc.nvidia.com

For further information, you can refer to https://github.com/NVIDIA/TensorRT/tree/main/tools/Polygraphy

1. Extract To Isolate A Subgraph

Extract the subgraph:

   polygraphy surgeon extract model.onnx \
   --inputs x1:auto:auto \
   --outputs add_out:auto \
   -o subgraph.onnx

If we knew the shapes and/or data types, we could instead write, for example:

   polygraphy surgeon extract model.onnx \
   --inputs x1:[1,3,224,224]:float32 \
   --outputs add_out:float32 \
   -o subgraph.onnx

[Optional] At this point, the model is ready for use. You can use inspect model to confirm whether it looks correct:

   polygraphy inspect model subgraph.onnx --mode=basic

2. Compare Accuracy through framework

You can use the run subtool to compare a model between different frameworks. In the simplest case, you can supply a model, and one or more framework flags. By default, run will generate synthetic input data, run inference using the specified frameworks, and finally compare outputs.

Compare an ONNX model between TensorRT and ONNX Runtime:

   polygraphy run dynamic_identity.onnx --trt --onnxrt

If our model uses dynamic input shapes, we can specify the shapes to use at runtime with the --input-shapes option:

   polygraphy run dynamic_identity.onnx --trt --onnxrt \
   --input-shapes X:[1,2,4,4]

[Optional] Compare per-layer outputs between TensorRT and ONNX-Runtime:

When network outputs do not match, it can be useful to compare per-layer outputs to see where the error is introduced. To do so, you can use the --trt-outputs and --onnx-outputs options respectively. These options accept one or more output names as their arguments. The special value mark all indicates that all tensors in the model should be compared:

   polygraphy run dynamic_identity.onnx --trt --onnxrt \
   --trt-outputs mark all \
   --onnx-outputs mark all

TIP: To find the first mismatched output more easily, you can use the --fail-fast option which will cause the tool to exit after the first mismatch between outputs.

TIP: By default polygraphy use 1e-5 as a_tol and r_tol tolerance which might be too sensitive, I recommend start from 1e-2 to debug accuracy issue.

Run the model with TensorRT and ONNX-Runtime using custom input data:

   polygraphy run dynamic_identity.onnx --trt --onnxrt \
   --data-loader-script data_loader.py \
   --trt-min-shapes X:[1,2,28,28] --trt-opt-shapes X:[1,2,28,28] --trt-max-shapes X:[1,2,28,28]

refer to https://github.com/NVIDIA/TensorRT/tree/main/tools/Polygraphy/examples/cli/run/05_comparing_with_custom_data

3. Inspect Model

Inspect the ONNX model:

   polygraphy inspect model identity.onnx --mode=basic

This will display something like:

   [I] ==== ONNX Model ====
   Name: test_identity | Opset: 8
   ---- 1 Graph Input(s) ----
   {x [dtype=float32, shape=(1, 1, 2, 2)]}
   ---- 1 Graph Output(s) ----
   {y [dtype=float32, shape=(1, 1, 2, 2)]}
   ---- 0 Initializer(s) ----
   ---- 1 Node(s) ----
   Node 0    |  [Op: Identity]
       {x [dtype=float32, shape=(1, 1, 2, 2)]}
        -> {y [dtype=float32, shape=(1, 1, 2, 2)]}

It is also possible to show detailed layer information, including layer attributes, using --mode=full.

Display the TensorRT network after parsing an ONNX model:

   polygraphy inspect model identity.onnx \
   --mode=basic --display-as=trt

This will display something like:

[I] ==== TensorRT Network ====

   Name: Unnamed Network 0 | Explicit Batch Network
   ---- 1 Network Input(s) ----
   {x [dtype=float32, shape=(1, 1, 2, 2)]}
   ---- 1 Network Output(s) ----
   {y [dtype=float32, shape=(1, 1, 2, 2)]}
   ---- 1 Layer(s) ----
   Layer 0    | node_of_y [Op: LayerType.IDENTITY]
       {x [dtype=float32, shape=(1, 1, 2, 2)]}
        -> {y [dtype=float32, shape=(1, 1, 2, 2)]}

It is also possible to show detailed layer information, including layer attributes, using --mode=full.

Inspect the TRT engine:

   polygraphy inspect model dynamic_identity.engine

This will display something like:

   [I] ==== TensorRT Engine ====
   Name: Unnamed Network 0 | Explicit Batch Engine (2 layers)
   ---- 1 Engine Input(s) ----
   {X [dtype=float32, shape=(1, 2, -1, -1)]}
   ---- 1 Engine Output(s) ----
   {Y [dtype=float32, shape=(1, 2, -1, -1)]}
   ---- Memory ----
   Device Memory: 0 bytes
   ---- 2 Profile(s) (2 Binding(s) Each) ----
   - Profile: 0
       Binding Index: 0 (Input)  [Name: X]             | Shapes: min=(1, 2, 1, 1), opt=(1, 2, 3, 3), max=(1, 2, 5, 5)
       Binding Index: 1 (Output) [Name: Y]             | Shape: (1, 2, -1, -1)
   - Profile: 1
       Binding Index: 2 (Input)  [Name: X [profile 1]] | Shapes: min=(1, 2, 2, 2), opt=(1, 2, 4, 4), max=(1, 2, 6, 6)
       Binding Index: 3 (Output) [Name: Y [profile 1]] | Shape: (1, 2, -1, -1)

Introduce some use cases of onnx-graphsurgeon

ONNX GraphSurgeon(Onnx-GS) is a tool that allows you to easily generate new ONNX graphs, or modify existing ones. It was released with TensorRT OSS. You may follow the readme to install it. This section will introduce some use cases modifying the onnx model using Onnx-GS.

1. Make dynamic

Since TRT 6.0 released, it only support explicit batch onnx model. So some users have to make the model input dynamic, below sample is showing how to use onnx-GS to modify the input without regenerating the onnx model in the training framework.

Here is the sample to make the model input dynamic:

import onnx_graphsurgeon as gs
import onnx
graph1 = gs.import_onnx(onnx.load("resnet10.onnx"))
tensors = graph1.tensors()
tensors["input"].shape[0] = gs.Tensor.DYNAMIC
tensors["Layer7_cov_Y"].shape[0] = gs.Tensor.DYNAMIC
tensors["Layer7_bbox_Y"].shape[0] = gs.Tensor.DYNAMIC
onnx.save(gs.export_onnx(graph1.cleanup().toposort()), "resnet10_dynamic.onnx")

If you still got error during building the engine, you may find the node which has static dims, let's say it's node named "OC2_DUMMY_1", then you need to change its first dims like this.

# Array is not writeable, need to copy it first.
tensors["OC2_DUMMY_1"].values = np.array(tensors["OC2_DUMMY_1"].values)
tensors["OC2_DUMMY_1"].values[0] = gs.Tensor.DYNAMIC

2. Change node's name

Here is the sample changing input and output names.

tensors["input"].name = "data"
tensors["Layer7_cov_Y"].name = "Layer7_cov"
tensors["Layer7_bbox_Y"].name = "Layer7_bbox"

3. Prune the model with a certain input or output layer

Sometimes we need to prune the model for narrowing down some issue, we could use the below script to specifically set the output tensor as you want:

import onnx_graphsurgeon as gs
import numpy as np
import sys
import onnx

# Cut the model to generate the model ended to the test node
print("cut model: ", sys.argv[1], " to end with node ", sys.argv[2])
graph = gs.import_onnx(onnx.load(sys.argv[1]))
tensors = graph.tensors()
# or graph.inputs = [tensors[str(sys.argv[2])].to_variable(dtype=np.float32)] if you want to cut input too.
graph.outputs = [tensors[str(sys.argv[2])].to_variable(dtype=np.float32)]

# removing any unnecessary nodes or tensors, so that we are left with only the subgraph.


onnx.save(gs.export_onnx(graph), new_onnx_model_name)

4. Add your Plugin

Onnx-GS also can be used for modifying the model with the custom plugin.
Download this sample.
Note that this sample only shows how to apply the layer params to the plugin layer and replace the original node with the plugin node.
We can see what the sample did as below, it replace the "DCNv2" with the plugin named "DCNv2_SS":

original graph

new graph