# Difference between revisions of "TensorRT/ONNX"

(→Some performance tests about dynamic shape with onnx model) |
(→TRT Inference with explicit batch onnx model) |
||

Line 5: | Line 5: | ||

Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br> | Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br> | ||

− | 1. Fixed shape model<br> | + | 1. '''Fixed shape model'''<br> |

If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br> | If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br> | ||

Line 16: | Line 16: | ||

and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br> | and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br> | ||

− | 2. Dynamic shape model<br> | + | 2. '''Dynamic shape model'''<br> |

If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br> | If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br> | ||

Line 23: | Line 23: | ||

---- | ---- | ||

+ | |||

=== How to convert your model to onnx? === | === How to convert your model to onnx? === | ||

1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br> | 1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br> |

## Revision as of 19:41, 14 April 2020

**This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model**

## Contents

- 1 TRT Inference with explicit batch onnx model
- 2 How to convert your model to onnx?
- 3 How to modify the model to replace batch dimension with dynamic dim?
- 4 How to use trtexec to run inference with dynamic shape?
- 5 How to convert onnx model to a tensorrt engine?
- 6 If you met some error during converting onnx to engine
- 7 Some performance tests about dynamic shape with onnx model

### TRT Inference with explicit batch onnx model

Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.

1. **Fixed shape model**

If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.

If you got below warning log when you’re trying to do inference with onnx model.

[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.

as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126

and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network.

2. **Dynamic shape model**

If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.

See a sample here:(https://github.com/lynettez/SampleONNX)

### How to convert your model to onnx?

1. Convert Pytorch model, you can use torch.onnx API, sample codes:

https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930

2. Convert Tensorflow model, using tf2onnx tool:

https://github.com/onnx/tensorflow-onnx

3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:

https://github.com/htshinichi/caffe-onnx

### How to modify the model to replace batch dimension with dynamic dim?

If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.

auto input = network->getInput(0); input->setDimensions(Dims4{-1, 3, 224, 224});

Or using onnx API:

import onnx model = onnx.load('alexnet_fixed.onnx') model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?' onnx.save(model, 'dynamic_alexnet.onnx') onnx.checker.check_model(model)

If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model with the new version TRT.

INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size ../builder/cudnnBuilderBlockChooser.cpp:136 Aborting…

### How to use trtexec to run inference with dynamic shape?

trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \ --minShapes=data:1x3x224x224 \ # kMIN shape --optShapes=data:3x3x224x224 \ # kOPT shape --maxShapes=data:5x3x224x224 \ # kMAX shape --shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingShape(3,3,224,224) --saveEngine=mobilenet_dynamic.engine

If you have onnx exported from TF with input “x:0”, you also could run with

trtexec … --shapes=\'x:0\':5x3x224x224 …

### How to convert onnx model to a tensorrt engine?

Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp

You also could use trtexec to do the same thing with below cmd:

trtexec --explicitBatch --onnx=your_model.onnx

### If you met some error during converting onnx to engine

If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:

1. Check ONNX model using checker function and see if it passes?

import onnx model = onnx.load("model.onnx") onnx.checker.check_model(model)

2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier

3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes

### Some performance tests about dynamic shape with onnx model

Test environment

GPU: T4

TensorRT: 7.0

CUDA: 10.2

MobilenetV2 | OptimizationProfile | Engine size(bit) |
---|---|---|

Fixed shape [1, 3, 224 ,224] | - | 14487087 |

Dynamic shape [-1, 3, 224, 224] | Not setting, default to [1, 3, 224 ,224] | 14487537 |

--minShapes=data:1x3x224x224 --optShapes=data:5x3x224x224 --maxShapes=data:10x3x224x224 | 14595945 | |

--minShapes=data:10x3x224x224 --optShapes=data:15x3x224x224 --maxShapes=data:20x3x224x224 | 14601941 | |

--minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 | 14604501 |

MobilenetV2 | Inference Batch | Execution time(ms) |
---|---|---|

Fixed shape [1, 3, 224 ,224] | 1 | 1.01 |

Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:8x3x224x224 --maxShapes=data:16x3x224x224 | 1 | 1.36 |

8 | 4.47 | |

16 | 8.76 | |

Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 | 1 | 1.44 |

8 | 4.56 | |

16 | 8.23 | |

32 | 16.21 |

As the test results showed,

1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.

2. The performance will be the best when the inference shape is the same as the optShape you set.