Difference between revisions of "TensorRT/LayerDumpAndAnalyze"

From eLinux.org
Jump to: navigation, search
m (Analyze the result)
m (Analyze the result)
Line 62: Line 62:
* '-m 0' denotes Euclidean distance.
* '-m 0' denotes Euclidean distance.
==== Attachment ====
Here is the whole patch we used to dump all layer (except activation layer) of sampleMNIST and analyze the accuracy of FP32 and FP16 activation,
Here is the whole zip package used to demonstrate all steps above,
The output looks like,
The output looks like,

Revision as of 17:59, 19 August 2019

This page is a step by step guide to illustrates how to dump activation result of the middle layers for TensorRT and analyze the result to understand what's happening.

It might be useful when you are facing the following cases,

  • Current model can be working well in your training framework, but not working when deploying through TensorRT.
  • Current model can be working well when running TensorRT FP32 mode, but not working well when running TensorRT FP16 mode.


  • Not working well means produced inferencing result is not accurate or totally incorrect.
  • Dumping and comparing the value distribution of FP32 and INT8 is not a convincing way to question INT8.

Here we takes sampleMNIST as an example,

Set the target layer as output

Why can't we retrieve the intermediate activation result directly from TensorRT?

This is kind of side effect of TensorRT memory optimization. In order to decrease the memory consumption, TensorRT only allocates memory space for several estimated cases (mostly the biggest spaces among all layers) and these memory spaces are assigned to certain layers during runtime. Once these layers get executed done, these memory will be recycled for the subsequent layer execution. Hence, if we want to save the activation result of some middle layer for analysis, we have to set it as output layer (in this case, user will allocate space to store its produced result).

Here is the C++ API to set the layer as output for caffe model

for (auto& s : mParams.outputTensorNames)                   

Allocate buffer for the output layers

You can utilize samplesCommon::BufferManager to allocate and manage your input and output buffers.

samplesCommon::BufferManager buffers(mEngine, mParams.batchSize);

Dump the activation result of the output layer after execution done

BufferManager has convenient method to dump the output buffer with intended format,

for (auto& s: mParams.outputTensorNames)
    buffers.dumpBuffer(fileName, s);

Here is the output of prob layer of sampleMNIST,

 [1, 10, 1, 1]

The first line is the shape of current feature maps and the following lines are the value of activation.

Analyze the result

There are many metrics to analyze the similarity of two static vectors, like the ways in OpenCV.

Here we utilize a script to estimate the similarity of two output tensors (from FP16 and FP32) for given layer.

It supports three metrics, Euclidean distance, Cosine similarity and relative difference.

python layer_analyzer.py -d ./results/ -m 0
  • '-d ./result/' denotes where all activation result files locate.
  • '-m 0' denotes Euclidean distance.


Here is the whole zip package used to demonstrate all steps above,


The output looks like,

                                      LayerName|              LayerShape|      Similarity%|
                                          conv2|           [1, 50, 8, 8]|         76.3468%|
                                           prob|           [1, 10, 1, 1]|         99.9995%|
                                          scale|          [1, 1, 28, 28]|         99.2844%|
                                          conv1|         [1, 20, 24, 24]|         90.8855%|
                                          pool1|         [1, 20, 12, 12]|         94.9611%|
                                            ip2|           [1, 10, 1, 1]|         97.9592%|
                                            ip1|          [1, 500, 1, 1]|         95.0792%|
                                          pool2|           [1, 50, 4, 4]|         88.9701%|