TensorRT/Int8CFAQ

How to do INT8 calibration without using BatchStream?
The way using BatchStream to do calibration is too complicate to accommodate for practice.

Here we provide a sample class BatchFactory which utilizes OpenCV for calibration data pre-processing and simplify the calibration procedure and make user easy to understand what is really needed for calibration.

NOTE: The pre-processing flow in BatchFactory should be adjusted according to the requirement of your netWork.

And then when we implement the IInt8EntropyCalibrator, we can use the API loadBatch from assistant class to load batch data directly. bool getBatch(void* bindings[], const char* names[], int nbBindings) override {                                                                                        float mean[3]{102.9801f, 115.9465f, 122.7717f}; // also in BGR order float *batchBuf = mBF.loadBatch(mean, 1.0f); // Indicates calibration data feeding done if (!batchBuf) return false; CHECK(cudaMemcpy(mDeviceInput, batchBuf, mInputCount * sizeof(float), cudaMemcpyHostToDevice)); assert(!strcmp(names[0], INPUT_BLOB_NAME0)); bindings[0] = mDeviceInput; return true; }

Can INT8 calibration table be compatible among different TRT versions or HW platforms?
INT8 calibration table is absolutely NOT compatible between different TRT versions. This is because the optimized network graph is probably different among various TRT versions. If you enforce to use them, TRT may not find the corresponding scaling factor for given tensor. As long as the installed TensorRT version is identical for different HW platforms, then the INT8 calibration table can be compatible. That means you can perform INT8 calibration on a faster computation platform, like V100 or T4 and then deploy the calibration table to Tegra for INT8 inferencing as long as these platforms have the same TensorRT version installed (at least with the same major and minor version, like 5.1.5 and 5.1.6).

How to do INT8 calibration for the networks with multiple inputs
TensorRT uses bindings to denote the input and output buffer pointer and they are arranged in order. Hence, if your network has multiple input node/layer, you can pass through the input buffer pointers into bindings (void **) separately, like below network with two inputs required, bool getBatch(void* bindings[], const char* names[], int nbBindings) override {                                                                                         // Prepare the batch data (on GPU) for mDeviceInput and imInfoDev ...            assert(!strcmp(names[0], INPUT_BLOB_NAME0)); bindings[0] = mDeviceInput; assert(!strcmp(names[1], INPUT_BLOB_NAME1)); bindings[1] = imInfoDev; return true; }    NOTE: If your calibration batch is 10, then for each calibration cycle, you will need to fill each of your input buffer with 10 images accordingly.

How to understand the principle of INT8 calibration?
Refer to the slide to get the specification of INT8 quantization. It's a post training quantization method.


 * symmetric and per channel quantization for weights


 * symmetric and per tensor quantization for activation


 * Use KL divergence to evaluate the quantization loss of two activation tensors