Difference between revisions of "TensorRT/LayerDumpAndAnalyze"

From eLinux.org
Jump to: navigation, search
m (Analyze the result)
m
(4 intermediate revisions by the same user not shown)
Line 7: Line 7:
 
NOTE:
 
NOTE:
 
* Not working well means produced inferencing result is not accurate or totally incorrect.
 
* Not working well means produced inferencing result is not accurate or totally incorrect.
* Dumping and comparing the value distribution of FP32 and INT8 is not a convincing way to question INT8.
+
* Dumping and comparing the value distribution of FP32 and INT8 may be not a convincing way to question INT8.
  
Here we takes [https://github.com/NVIDIA/TensorRT/tree/release/5.1/samples/opensource/sampleMNIST sampleMNIST] as an example,
+
Here we takes [[File:SampleMNIST.zip|thumb]] as an example (which is based on [[https://github.com/NVIDIA/TensorRT/tree/release/5.1/samples/opensource/sampleMNIST TensorRT_5.1_OSS]] release),
  
 
==== <big> Set the target layer as output</big> ====
 
==== <big> Set the target layer as output</big> ====
Line 51: Line 51:
  
 
==== <big>Analyze the result</big> ====
 
==== <big>Analyze the result</big> ====
There are many metrics to analyze the distance of two static value distribution, like the [https://docs.opencv.org/3.4/d8/dc8/tutorial_histogram_comparison.html ways] in OpenCV.  
+
There are many metrics to analyze the similarity of two static vectors, like the [https://docs.opencv.org/3.4/d8/dc8/tutorial_histogram_comparison.html ways] in OpenCV.  
  
Here we use a script to calculate the similarity of two output tensors for given layer. It supports three metrics, Euclidean distance, Cosine similarity and relative difference.  
+
Here we utilize a script to estimate the similarity of two output tensors (from FP16 and FP32) for given layer.  
  
python layer_analyzer.py -d ./results/
+
It supports three metrics, Euclidean distance, Cosine similarity and relative difference.  
  
* ./result/ is the folder that saves all layer result files
+
python layer_analyzer.py -d ./results/ -m 0
* 0.1 is the threshold to determine whether the current position of dumped feature map are matching with each other or not
 
  
 
+
* '-d ./result/' denotes where all activation result files locate.
-----
+
* '-m 0' denotes Euclidean distance.
 
 
Here is the whole patch we used to dump all layer (except activation layer) of sampleMNIST and analyze the accuracy of FP32 and FP16 activation,
 
 
 
[[File:0001-Dump-all-layers-and-analyze-the-value-distribution.patch|thumb]]
 
  
 
The output looks like,
 
The output looks like,

Revision as of 19:04, 19 August 2019

This page is a step by step guide to illustrates how to dump activation result of the middle layers for TensorRT and analyze the result to understand what's happening.

It might be useful when you are facing the following cases,

  • Current model can be working well in your training framework, but not working when deploying through TensorRT.
  • Current model can be working well when running TensorRT FP32 mode, but not working well when running TensorRT FP16 mode.

NOTE:

  • Not working well means produced inferencing result is not accurate or totally incorrect.
  • Dumping and comparing the value distribution of FP32 and INT8 may be not a convincing way to question INT8.

Here we takes File:SampleMNIST.zip as an example (which is based on [TensorRT_5.1_OSS] release),

Set the target layer as output

Why can't we retrieve the intermediate activation result directly from TensorRT?

This is kind of side effect of TensorRT memory optimization. In order to decrease the memory consumption, TensorRT only allocates memory space for several estimated cases (mostly the biggest spaces among all layers) and these memory spaces are assigned to certain layers during runtime. Once these layers get executed done, these memory will be recycled for the subsequent layer execution. Hence, if we want to save the activation result of some middle layer for analysis, we have to set it as output layer (in this case, user will allocate space to store its produced result).

Here is the C++ API to set the layer as output for caffe model

for (auto& s : mParams.outputTensorNames)                   
{                                                           
   network->markOutput(*blobNameToTensor->find(s.c_str()));
}

Allocate buffer for the output layers

You can utilize samplesCommon::BufferManager to allocate and manage your input and output buffers.

samplesCommon::BufferManager buffers(mEngine, mParams.batchSize);

Dump the activation result of the output layer after execution done

BufferManager has convenient method to dump the output buffer with intended format,

for (auto& s: mParams.outputTensorNames)
{
    buffers.dumpBuffer(fileName, s);
}

Here is the output of prob layer of sampleMNIST,

 [1, 10, 1, 1]
 3.08735e-13
 2.62257e-09
 1.1331e-10
 2.88912e-06
 1.28611e-07
 1.19931e-09
 1.70497e-13
 1.09948e-09
 0.999995
 1.76669e-06

The first line is the shape of current feature maps and the following lines are the value of activation.

Analyze the result

There are many metrics to analyze the similarity of two static vectors, like the ways in OpenCV.

Here we utilize a script to estimate the similarity of two output tensors (from FP16 and FP32) for given layer.

It supports three metrics, Euclidean distance, Cosine similarity and relative difference.

python layer_analyzer.py -d ./results/ -m 0
  • '-d ./result/' denotes where all activation result files locate.
  • '-m 0' denotes Euclidean distance.

The output looks like,

                                      LayerName|              LayerShape|      Similarity%|
                                          conv2|           [1, 50, 8, 8]|         76.3468%|
                                           prob|           [1, 10, 1, 1]|         99.9995%|
                                          scale|          [1, 1, 28, 28]|         99.2844%|
                                          conv1|         [1, 20, 24, 24]|         90.8855%|
                                          pool1|         [1, 20, 12, 12]|         94.9611%|
                                            ip2|           [1, 10, 1, 1]|         97.9592%|
                                            ip1|          [1, 500, 1, 1]|         95.0792%|
                                          pool2|           [1, 50, 4, 4]|         88.9701%|