Difference between revisions of "TensorRT/LayerDumpAndAnalyzeForINT8"

From eLinux.org
Jump to: navigation, search
(1. Set precision for the layer after target layer)
Line 11: Line 11:
 
## Compare the output activation values with FP32's. If the loss is big (there is no fix threshold to judge what kind of loss could be considered as 'big' one, but you can get a sense from the loss trend during recent iterations)
 
## Compare the output activation values with FP32's. If the loss is big (there is no fix threshold to judge what kind of loss could be considered as 'big' one, but you can get a sense from the loss trend during recent iterations)
  
It's something like below,
+
The process is something like below,
 +
 
 
  '''layer1_int8''' --> layer2_fp32 --> layer3_fp32 --> layer4_fp32 --> … --> layerN_fp32 --> layer_output
 
  '''layer1_int8''' --> layer2_fp32 --> layer3_fp32 --> layer4_fp32 --> … --> layerN_fp32 --> layer_output
 
  layer1_int8 --> '''layer2_int8''' --> layer3_fp32 --> layer4_fp32 --> … --> layerN_fp32 --> layer_output
 
  layer1_int8 --> '''layer2_int8''' --> layer3_fp32 --> layer4_fp32 --> … --> layerN_fp32 --> layer_output
Line 22: Line 23:
 
  …
 
  …
 
  layer1_int8 --> layer2_int8 --> layer3_fp32 --> … --> '''layerN_int8''' --> layer_output
 
  layer1_int8 --> layer2_int8 --> layer3_fp32 --> … --> '''layerN_int8''' --> layer_output
 +
 +
  
 
''' Why can't we compare layer by layer activation directly? '''
 
''' Why can't we compare layer by layer activation directly? '''
Line 34: Line 37:
  
 
Here we takes [https://elinux.org/File:Samplemnist_accuracy_int8.zip Samplemnist_accuracy_int8] as an example (which is based on [https://github.com/NVIDIA/TensorRT/tree/release/5.1/samples/opensource/sampleMNIST TensorRT_5.1_OSS] release),
 
Here we takes [https://elinux.org/File:Samplemnist_accuracy_int8.zip Samplemnist_accuracy_int8] as an example (which is based on [https://github.com/NVIDIA/TensorRT/tree/release/5.1/samples/opensource/sampleMNIST TensorRT_5.1_OSS] release),
 +
 +
  
  

Revision as of 23:53, 10 September 2019

This page is a guide to illustrate how to break down which layer causes the significant accuracy loss to your network when deploying INT8 mode.

Here is the approach,

  1. Find out an image with worse INT8 accuracy.
  2. Use above image to perform FP32 inference and dump the output activation values
  3. Iterate all layers and do the following experiment,
    1. Set the layer before target layer running INT8 mode
    2. Set the layer after target layer running FP32 mode
    3. Perform INT8 Inference and save the output activation values
    4. Compare the output activation values with FP32's. If the loss is big (there is no fix threshold to judge what kind of loss could be considered as 'big' one, but you can get a sense from the loss trend during recent iterations)

The process is something like below,

layer1_int8 --> layer2_fp32 --> layer3_fp32 --> layer4_fp32 --> … --> layerN_fp32 --> layer_output
layer1_int8 --> layer2_int8 --> layer3_fp32 --> layer4_fp32 --> … --> layerN_fp32 --> layer_output
layer1_int8 --> layer2_int8 --> layer3_int8 --> layer4_fp32 --> … --> layerN_fp32 --> layer_output

If we observe big accuracy loss when layer3 running INT8, then we set layer3 running higher precision mode and continue the experiments.

layer1_int8 --> layer2_int8 --> layer3_fp32 --> layer4_fp32 --> … --> layerN_fp32 --> layer_output
layer1_int8 --> layer2_int8 --> layer3_fp32 --> layer4_int8 --> … --> layerN_fp32 --> layer_output
…
layer1_int8 --> layer2_int8 --> layer3_fp32 --> … --> layerN_int8 --> layer_output


Why can't we compare layer by layer activation directly?

Because we don't actually care how the intermediate activation looks like compared to FP32's. Sometime even if the accuracy loss for some middle layer is very big, the final result may be not influenced (perhaps due to your network has big tolerance for current task). Hence, the output loss is the only useful factor for us to evaluate network accuracy.

Why can't we dump all layer INT8 result one time?

For example, if layer3 generates big loss to the final output, then the layers after layer3 might also have big accuracy loss. So before we continue to figure out the other potential problematic layer, we should rule out layer3 firstly (through running FP32 mode) to get rid of interactive influence.


Here we takes Samplemnist_accuracy_int8 as an example (which is based on TensorRT_5.1_OSS release),



1. Set precision for the layer after target layer

builder->setStrictTypeConstraints(true);                                         
for (int i = mParams.scopedLayerIndex + 1; i < network->getNbLayers(); i++)          
{                                                                                
   auto layer = network->getLayer(i);                                           
   layer->setPrecision(nvinfer1::DataType::kFLOAT);                             
}                                                                                
mParams.scopedLayerName = network->getLayer(mParams.scopedLayerIndex)->getName();

NOTE:

  1. we have to configure strict type so that the layer precision can be configured successfully, or else, it may compromise during network optimization.
  2. 'scopedLayerIndex' is the target layer we scope.
  3. 'scopedLayerName' used to store the layer name.

2. Dump the output result

buffers.dumpBuffer(file, s);

3. Analyze the accuracy loss

python layer_analyzer_int8.py

For example, it's the output for mnist (the layerName 'prob_ip2' means it is the output when ip2 is the target layer to scope),

                                      LayerName|              LayerShape|      Similarity%|
                                       prob_ip2|           [1, 10, 1, 1]|         99.9997%|
                                     prob_pool1|           [1, 10, 1, 1]|         99.9996%|
                                     prob_scale|           [1, 10, 1, 1]|         99.9994%|
                                     prob_conv2|           [1, 10, 1, 1]|         99.9995%|
                                      prob_prob|           [1, 10, 1, 1]|         99.9997%|
                                     prob_conv1|           [1, 10, 1, 1]|         99.9996%|
             prob_(Unnamed Layer* 9) [Constant]|           [1, 10, 1, 1]|         99.9997%|
                                     prob_relu1|           [1, 10, 1, 1]|         99.9997%|
                                       prob_ip1|           [1, 10, 1, 1]|         99.9997%|
                                     prob_pool2|           [1, 10, 1, 1]|         99.9995%|
         prob_(Unnamed Layer* 10) [ElementWise]|           [1, 10, 1, 1]|         99.9997%|