TensorRT/LayerDumpAndAnalyzeForINT8

This page is a guide to illustrate how to break down which layer causes the significant accuracy loss to your network when deploying INT8 mode.

'''Here is the approach, '''
 * 1) Find out an image with worse INT8 accuracy.
 * 2) Use above image to perform FP32 inference and dump the output activation values
 * 3) Iterate all layers and do the following experiment,
 * 4) Set the layer before target layer running INT8 mode
 * 5) Set the layer after target layer running FP32 mode
 * 6) Perform INT8 Inference and save the output activation values
 * 7) Compare the output activation values with FP32's. If the loss is big (there is no fix threshold to judge what kind of loss could be considered as 'big' one, but you can get a sense from the loss trend during recent iterations)

The process is something like below,

layer1_int8 --> layer2_fp32 --> layer3_fp32 --> layer4_fp32 --> … --> layerN_fp32 --> layer_output layer1_int8 --> layer2_int8 --> layer3_fp32 --> layer4_fp32 --> … --> layerN_fp32 --> layer_output layer1_int8 --> layer2_int8 --> layer3_int8 --> layer4_fp32 --> … --> layerN_fp32 --> layer_output

If we observe big accuracy loss when layer3 running INT8, then we set layer3 running higher precision mode and continue the experiments.

layer1_int8 --> layer2_int8 --> layer3_fp32 --> layer4_fp32 --> … --> layerN_fp32 --> layer_output layer1_int8 --> layer2_int8 --> layer3_fp32 --> layer4_int8 --> … --> layerN_fp32 --> layer_output … layer1_int8 --> layer2_int8 --> layer3_fp32 --> … --> layerN_int8 --> layer_output

''' Why can't we compare layer by layer activation directly? '''

Because we don't actually care how the intermediate activation looks like compared to FP32's. Sometime even if the accuracy loss for some middle layer is very big, the final result may be not influenced (perhaps due to your network has big tolerance for current task). Hence, the output loss is the only useful factor for us to evaluate network accuracy.

'''Why can't we dump all layer INT8 result one time? '''

For example, if layer3 generates big loss to the final output, then the layers after layer3 might also have big accuracy loss. So before we continue to figure out the other potential problematic layer, we should rule out layer3 firstly (through running FP32 mode) to get rid of interactive influence.

Here we takes as an example (which is based on TensorRT_5.1_OSS release),

1. Set precision for the layer after target layer
builder->setStrictTypeConstraints(true); for (int i = mParams.scopedLayerIndex + 1; i < network->getNbLayers; i++) {                                                                                   auto layer = network->getLayer(i); layer->setPrecision(nvinfer1::DataType::kFLOAT); }                                                                                mParams.scopedLayerName = network->getLayer(mParams.scopedLayerIndex)->getName;

NOTE:
 * 1) we have to configure strict type so that the layer precision can be configured successfully, or else, it may compromise during network optimization.
 * 2) 'scopedLayerIndex' is the target layer we scope.
 * 3) 'scopedLayerName' used to store the layer name.

2. Dump the output result
buffers.dumpBuffer(file, s);

3. Iterate the experiments
// Layer dump and debug if (sample.mParams.nbLayers != 0 && sample.mParams.int8) {                                                              for (int i = 0; i < sample.mParams.nbLayers; i++) {                                                              sample.mParams.scopedLayerIndex = i;                        if (!sample.build) {                                                              return gLogger.reportFail(sampleTest); }                                                          if (!sample.infer) {                                                              return gLogger.reportFail(sampleTest); }                                                      }                                                       }

4. Analyze the accuracy loss
python layer_analyzer_int8.py

For example, it's the output for mnist (the ),

NOTE: layerName prob_ip2 means the case that ip2 and previous layers running INT8, afterward layer running FP32, and the similarity is calculated between FP32 prob and INT8 prob 

LayerName|             LayerShape|      Similarity%| prob_ip2|          [1, 10, 1, 1]|         99.9997%| prob_pool1|          [1, 10, 1, 1]|         99.9996%| prob_scale|          [1, 10, 1, 1]|         99.9994%| prob_conv2|          [1, 10, 1, 1]|         99.9995%| prob_prob|          [1, 10, 1, 1]|         99.9997%| prob_conv1|          [1, 10, 1, 1]|         99.9996%| prob_(Unnamed Layer* 9) [Constant]|          [1, 10, 1, 1]|         99.9997%| prob_relu1|          [1, 10, 1, 1]|         99.9997%| prob_ip1|          [1, 10, 1, 1]|         99.9997%| prob_pool2|          [1, 10, 1, 1]|         99.9995%| prob_(Unnamed Layer* 10) [ElementWise]|          [1, 10, 1, 1]|         99.9997%|