Difference between revisions of "TensorRT/LayerDumpAndAnalyze"

Revision as of 23:12, 15 August 2019

This page is a step by step guide to illustrates how to dump activation result of the middle layers for TensorRT and analyze the result to understand what's happening.

It might be useful when you are facing the following cases,

Current model can be working well in your training framework, but not working when deploying through TensorRT.
Current model can be working well when running TensorRT FP32 mode, but not working well when running TensorRT FP16 mode.

NOTE:

Not working well means produced inferencing result is not accurate or totally incorrect.
Dumping and comparing the value distribution of FP32 and INT8 is not a convincing way to question INT8.

Here we takes sampleMNIST as an example,

This is kind of side effect of TensorRT memory optimization. In order to decrease the memory consumption, TensorRT only allocates memory space for several estimated cases (mostly the biggest spaces among all layers) and these memory spaces are assigned to certain layers during runtime. Once these layers get executed done, these memory will be recycled for the subsequent layer execution. Hence, if we want to save the activation result of some middle layer for analysis, we have to set it as output layer (in this case, user will allocate space to store its produced result).

Here is the C++ API to set the layer as output for caffe model

for (auto& s : mParams.outputTensorNames)                   
{                                                           
   network->markOutput(*blobNameToTensor->find(s.c_str()));
}

Allocate buffer for the output layers

You can utilize samplesCommon::BufferManager to allocate and manage your input and output buffers.

samplesCommon::BufferManager buffers(mEngine, mParams.batchSize);

Dump the activation result of the output layer after execution done

BufferManager has convenient method to dump the output buffer with intended format,

for (auto& s: mParams.outputTensorNames)
{
    buffers.dumpBuffer(fileName, s);
}

Here is the output of prob layer of sampleMNIST,

 [1, 10, 1, 1]
 3.08735e-13
 2.62257e-09
 1.1331e-10
 2.88912e-06
 1.28611e-07
 1.19931e-09
 1.70497e-13
 1.09948e-09
 0.999995
 1.76669e-06

The first line is the shape of current feature maps and the following lines are the value of activation.

Analyze the result

There are many metrics to analyze the distance of two static value distribution, like the ways in OpenCV.

Here we use a script to calculate the similarity of two output tensors for given layer. It supports three metrics, Euclidean distance, Cosine similarity and relative difference.

python layer_analyzer.py -d ./results/

./result/ is the folder that saves all layer result files
0.1 is the threshold to determine whether the current position of dumped feature map are matching with each other or not

Here is the whole patch we used to dump all layer (except activation layer) of sampleMNIST and analyze the accuracy of FP32 and FP16 activation,

File:0001-Dump-all-layer-and-analyze-the-value-distribution.patch

The output looks like,

                                      LayerName|              LayerShape|      Similarity%|
                                          conv2|           [1, 50, 8, 8]|         76.3468%|
                                           prob|           [1, 10, 1, 1]|         99.9995%|
                                          scale|          [1, 1, 28, 28]|         99.2844%|
                                          conv1|         [1, 20, 24, 24]|         90.8855%|
                                          pool1|         [1, 20, 12, 12]|         94.9611%|
                                            ip2|           [1, 10, 1, 1]|         97.9592%|
                                            ip1|          [1, 500, 1, 1]|         95.0792%|
                                          pool2|           [1, 50, 4, 4]|         88.9701%|

@@ Line 53: / Line 53: @@
 There are many metrics to analyze the distance of two static value distribution, like the [https://docs.opencv.org/3.4/d8/dc8/tutorial_histogram_comparison.html ways] in OpenCV.
-Here we use a script to calculate the percentage of relative difference(similar as [https://docs.opencv.org/2.4/modules/core/doc/operations_on_arrays.html#norm norm] in OpenCV) larger than specified threshold,
+Here we use a script to calculate the similarity of two output tensors for given layer. It supports three metrics, Euclidean distance, Cosine similarity and relative difference.
-  python layer_analyzer.py -d ./results/ -t 0.1
+  python layer_analyzer.py -d ./results/
 * ./result/ is the folder that saves all layer result files
@@ Line 69: / Line 69: @@
 The output looks like,
-               LayerName|              LayerShape| Threshold%|  TotalNum| InaccurateNum| InaccurateRatio%|
+                                       LayerName|              LayerShape|      Similarity%|
-                   conv2|           [1, 50, 8, 8]|   10.0000%|      3200|            12|          0.3750%|
+                                           conv2|           [1, 50, 8, 8]|         76.3468%|
-                    prob|           [1, 10, 1, 1]|   10.0000%|        10|             6|         60.0000%|
+                                            prob|           [1, 10, 1, 1]|         99.9995%|
-                   scale|          [1, 1, 28, 28]|   10.0000%|       784|             0|          0.0000%|
+                                           scale|          [1, 1, 28, 28]|         99.2844%|
-                   conv1|         [1, 20, 24, 24]|   10.0000%|     11520|            32|          0.2778%|
+                                           conv1|         [1, 20, 24, 24]|         90.8855%|
-                   pool1|         [1, 20, 12, 12]|   10.0000%|      2880|            10|          0.3472%|
+                                           pool1|         [1, 20, 12, 12]|         94.9611%|
-                     ip2|           [1, 10, 1, 1]|   10.0000%|        10|             0|          0.0000%|
+                                             ip2|           [1, 10, 1, 1]|         97.9592%|
-                     ip1|          [1, 500, 1, 1]|   10.0000%|       500|             3|          0.6000%|
+                                             ip1|          [1, 500, 1, 1]|         95.0792%|
-                   pool2|           [1, 50, 4, 4]|   10.0000%|       800|             3|          0.3750%|
+                                           pool2|           [1, 50, 4, 4]|         88.9701%|

Difference between revisions of "TensorRT/LayerDumpAndAnalyze"

Revision as of 23:12, 15 August 2019

Contents

Set the target layer as output

Allocate buffer for the output layers

Dump the activation result of the output layer after execution done

Analyze the result

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Where else to find us

Tools