TensorRT/LayerDumpAndAnalyze

This page is a step by step guide to illustrates how to dump activation result of the middle layers for TensorRT and analyze the result to understand what's happening.

It might be useful when you are facing the following cases,
 * Current model can be working well in your training framework, but not working when deploying through TensorRT.
 * Current model can be working well when running TensorRT FP32 mode, but not working well when running TensorRT FP16 mode.

NOTE:
 * Not working well means produced inferencing result is not accurate or totally incorrect.
 * Dumping and comparing the value distribution of FP32 and INT8 may be not a convincing way to question INT8.

Here we takes as an example (which is based on [TensorRT_5.1_OSS] release),

Set the target layer as output
Why can't we retrieve the intermediate activation result directly from TensorRT?

This is kind of side effect of TensorRT memory optimization. In order to decrease the memory consumption, TensorRT only allocates memory space for several estimated cases (mostly the biggest spaces among all layers) and these memory spaces are assigned to certain layers during runtime. Once these layers get executed done, these memory will be recycled for the subsequent layer execution. Hence, if we want to save the activation result of some middle layer for analysis, we have to set it as output layer (in this case, user will allocate space to store its produced result).

Here is the C++ API to set the layer as output for caffe model for (auto& s : mParams.outputTensorNames) {                                                              network->markOutput(*blobNameToTensor->find(s.c_str)); }

Allocate buffer for the output layers
You can utilize samplesCommon::BufferManager to allocate and manage your input and output buffers. samplesCommon::BufferManager buffers(mEngine, mParams.batchSize);

Dump the activation result of the output layer after execution done
BufferManager has convenient method to dump the output buffer with intended format, for (auto& s: mParams.outputTensorNames) {    buffers.dumpBuffer(fileName, s); }

Here is the output of prob layer of sampleMNIST,

[1, 10, 1, 1] 3.08735e-13 2.62257e-09 1.1331e-10 2.88912e-06 1.28611e-07 1.19931e-09 1.70497e-13 1.09948e-09 0.999995 1.76669e-06

The first line is the shape of current feature maps and the following lines are the value of activation.

Analyze the result
There are many metrics to analyze the similarity of two static vectors, like the ways in OpenCV.

Here we utilize a script to estimate the similarity of two output tensors (from FP16 and FP32) for given layer.

It supports three metrics, Euclidean distance, Cosine similarity and relative difference.

python layer_analyzer.py -d ./results/ -m 0


 * '-d ./result/' denotes where all activation result files locate.
 * '-m 0' denotes Euclidean distance.

The output looks like,

LayerName|             LayerShape|      Similarity%| conv2|          [1, 50, 8, 8]|         76.3468%| prob|          [1, 10, 1, 1]|         99.9995%| scale|         [1, 1, 28, 28]|         99.2844%| conv1|        [1, 20, 24, 24]|         90.8855%| pool1|        [1, 20, 12, 12]|         94.9611%| ip2|          [1, 10, 1, 1]|         97.9592%| ip1|         [1, 500, 1, 1]|         95.0792%| pool2|          [1, 50, 4, 4]|         88.9701%|