TensorRT/How2Debug

From eLinux.org
< TensorRT
Revision as of 22:22, 13 October 2019 by Mchi (talk | contribs)
Jump to: navigation, search

Layer Dump and Analyze

Refer to the old page or new page


How to dump the output of certain layer?

TensorRT doesn’t store the intermediate result of your network, so you have to use the following API to mark the intended layer as output layer, and then interference again and save its result for further analysis,

network->markOutput(“layer_name”)

NOTE:

  • You can set multiplier layers as the output at the same time, but setting the layer as output may break the network optimization and impact the inference performance, as TensorRT always runs output layer in FP32 mode, no matter which mode you have configured.
  • Don’t forget to adjust the dimension or output buffer size after you change the output layer.

How to debug ONNX model with setting extra output layer?

Sometimes we need to debug our model with dumping output of middle layer, this FAQ will show you a way to set middle layer as output for debugging ONNX model.
The below steps are setting one middle layer of mnist.onnx model as output using the patch shown at the bottom.

  1. Download onnx-tensorrt and mnist.onnx
  2. Get all nodes info: Apply the first section "dump all nodes' output" change and build onx2trt.then run the command to get all nodes:
    $ ./onnx2trt mnist.onnx -o mnist.engine
  3. Set one layer as output: Pick up the node name from the output of step2, and set it as output with the 2nd section "set one layer as output" change, rebuild onx2trt, and run below command to regenerate the engine.
    $ ./onnx2trt mnist.onnx -o mnist.engine
  4. Dump output with the engine file:
    $ ./trtexec  --engine=mnist.engine --input=Input3 --output=Plus214_Output_0 --output=Convolution110_Output_0 --dumpOutput

Here is the patch based on onnx-tensorrt

diff --git a/ModelImporter.cpp b/ModelImporter.cpp
index ac4749c..8638add 100644
--- a/ModelImporter.cpp
+++ b/ModelImporter.cpp
@@ -524,6 +524,19 @@ ModelImporter::importModel(::ONNX_NAMESPACE::ModelProto const &model,
     output_names.push_back(model.graph().output(i).name());
   }
+  // ======= dump all nodes' output ============
+  int node_size = graph.node_size();
+  cout << "ModelImporter::importModel : graph.node_size() = " << node_size << " *******" << endl;
+  for (int i = 0; i < graph.node_size(); i++) {
+         ::ONNX_NAMESPACE::NodeProto const& node = graph.node(i);
+         if( node.output().size() > 0 ) {
+                 cout << "node[" << i << "] = "
+                        << node.output(0) << ":"
+                        << node.op_type() << endl;
+         }
+  }
+  // =========================================
+
   string_map<TensorOrWeights> tensors;
   TRT_CHECK(importInputs(&_importer_ctx, graph, &tensors, weight_count,
                          weight_descriptors));
@@ -559,10 +572,17 @@ ModelImporter::importModel(::ONNX_NAMESPACE::ModelProto const &model,
     }
   }
   _current_node = -1;
+
+  // =========== set one layer as output, below "Convolution110_Output_0" is from abobe dump ==
+  nvinfer1::ITensor* new_output_tensor_ptr = &tensors.at("Convolution110_Output_0").tensor();
+  _importer_ctx.network()->markOutput(*new_output_tensor_ptr);
+  // ==========================================================================================
+

How to analyze network performance?

First of all, we should be aware of the profiling command tool that TensorRT provides - trtexec.
If all your network layer has been supported by TensorRT through either native way or plugin way, you can always utilize this tool to profile your network very quickly.
Second, you can add profiling metrics for your application manually from CPU side (link) or GPU side (link).
NOTE:

  • Time collection should only contain the network enqueue() or execute() and any context set-up or memory initialization or refill operation should be excluded.
  • Add more iterations for the time collection, in order to avagage the GPU warm-up effect.

Third, if you would like to scope the time consumption of each layer, you can implement IProfiler to achieve that, or utilize SimpleProfiler TensorRT already provides (refer to below patch for sampleSSD),

--- sampleSSD.cpp.orig	2019-05-27 12:39:14.193521455 +0800
+++ sampleSSD.cpp	2019-05-27 12:38:59.393358775 +0800
@@ -428,8 +428,11 @@
     float* detectionOut = new float[N * kKEEP_TOPK * 7];
     int* keepCount = new int[N];
 
+    SimpleProfiler profiler (" layer time");
+    context->setProfiler(&profiler);
     // Run inference
     doInference(*context, data, detectionOut, keepCount, N);
+    std::cout << profiler;
 
     bool pass = true;