TensorRT/LayerDumpAndAnalyze

This page is a step by step guide to illustrates how to dump activation result of the middle layers for TensorRT and analyze the result to understand what's happening.

It might be useful when you are facing the following cases,
 * Current model can be working well in your training framework, but not working when deploying through TensorRT.
 * Current model can be working well when running TensorRT FP32 mode, but not working well when running TensorRT FP16 mode.

NOTE:
 * Not working well means produced inferencing result is not accurate or totally incorrect.
 * Dumping and comparing the value distribution of FP32 and INT8 is not a convincing way to question INT8.

Here we takes sampleMNIST as an example,

Set the target layer as output
Why can't we retrieve the intermediate activation result directly from TensorRT?

This is kind of side effect of TensorRT memory optimization. In order to decrease the memory consumption, TensorRT only allocates memory space for several estimated cases (mostly the biggest spaces among all layers) and these memory spaces are assigned to certain layers during runtime. Once these layers get executed done, these memory will be recycled for the subsequent layer execution. Hence, if we want to save the activation result of some middle layer for analysis, we have to set it as output layer (in this case, user will allocate space to store its produced result).

Here is the C++ API to set the layer as output for caffe model for (auto& s : mParams.outputTensorNames) {                                                              network->markOutput(*blobNameToTensor->find(s.c_str)); }

Allocate buffer for the output layers
You can utilize samplesCommon::BufferManager to allocate and manage your input and output buffers. samplesCommon::BufferManager buffers(mEngine, mParams.batchSize);

Dump the activation result of the output layer after execution done
BufferManager has convenient method to dump the output buffer with intended format, for (auto& s: mParams.outputTensorNames) {    buffers.dumpBuffer(fileName, s); }

Here is the output of prob layer of sampleMNIST,

[1, 10, 1, 1] 3.08735e-13 2.62257e-09 1.1331e-10 2.88912e-06 1.28611e-07 1.19931e-09 1.70497e-13 1.09948e-09 0.999995 1.76669e-06

The first line is the shape of current feature maps and the following lines are the value of activation.

Analyze the result
There are many metrics to analyze the distance of two static value distribution, like the ways in OpenCV.

Here we use a script to calculate the percentage of relative difference(similar as norm in OpenCV) larger than specified threshold,

python layer_analyzer.py -d ./results/ -t 0.1


 * ./result/ is the folder that saves all layer result files
 * 0.1 is the threshold to determine whether the current position of dumped feature map are matching with each other or not

-

Here is the whole patch we used to dump all layer (except activation layer) of sampleMNIST and analyze the accuracy of FP32 and FP16 activation,



The output looks like,

LayerName|             LayerShape| Threshold%|  TotalNum| InaccurateNum| InaccurateRatio%| conv2|          [1, 50, 8, 8]|   10.0000%|      3200|            12|          0.3750%| prob|          [1, 10, 1, 1]|   10.0000%|        10|             6|         60.0000%| scale|         [1, 1, 28, 28]|   10.0000%|       784|             0|          0.0000%| conv1|        [1, 20, 24, 24]|   10.0000%|     11520|            32|          0.2778%| pool1|        [1, 20, 12, 12]|   10.0000%|      2880|            10|          0.3472%| ip2|          [1, 10, 1, 1]|   10.0000%|        10|             0|          0.0000%| ip1|         [1, 500, 1, 1]|   10.0000%|       500|             3|          0.6000%| pool2|          [1, 50, 4, 4]|   10.0000%|       800|             3|          0.3750%|