CalibrationTable and executable engine

Hi,
My environments are Xeon® CPU E5-2620 v4 + Nvidia T4,
sudo nvidia-docker run … tensorrt:19.10-py3 bash and
sudo docker attach this container-id.

I have make and run ./sample_int8 mnist successfully, but there are several issues confuse me:

  1. Why it only has processed 12800 images rather than 40000 images which exposed in /opt/tensorrt/samples/sampleINT8/README.md?
    (the input dataset are train-images-idx3-ubyte and train-labels-idx1-ubyte)

  2. How to explain the value of each “layer name : value” pairs within CalibrationTablemnist?

  3. How to produce the executable engine after executing sample_int8?

Hi,

You can serialize the generated engine in code to reuse the future inference.
Please refer below link for more details:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt-601/tensorrt-developer-guide/index.html#serial_model_c

Regarding layer name : value" pairs within CalibrationTablemnist:
: value corresponds to the floating point activation scales determined during calibration for each tensor in the network.

Please check that all the data sets are downloaded and copied to /samples/data/int8/mnist/ directory:
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/sampleINT8#batch-files-for-calibration

Thanks

Hi SunilJB,
Thanks for your reply!!
: value corresponds to the floating point activation scales determined during calibration for each tensor in the network.

  1. These values are so huge and much great than 127(the max value of int8), how the values can be calibrated for int8? does it should be x= abs(value) / 127; therefor, x is real activation scales for int8 ?

  2. Why all these values only positive but no negative?

  3. My input dataset train-images-idx3-ubyte and train-labels-idx1-ubyte are download form the link you described, but the total images parsing by sample_int8 are 12800 not 40000, does there is another dataset can be download?

Hi,

  1. The dynamic range is the reciprocal of scale, thus it could be large. It depends on the distribution of inputs.

  2. TRT maps [-maxRange, maxRange] to [-127, 127], the range is symmetric, so it should always be positive.

  3. Please check the values of “nbScoreBatches” and “batchSize” in the code:
    They might be different than in these GitHub values
    https://github.com/NVIDIA/TensorRT/blob/release/6.0/samples/opensource/sampleINT8/sampleINT8.cpp#L463

Thanks

Hi SunilJB,

1.If the dynamic range is the reciprocal of scale, I give the result of CalibrationTableMnist:
TRT-6001-EntropyCalibration2
data: 3c008912
conv1: 3c88edfc
pool1: 3c88edfc

their dynamic range should be:
data: -1/3c008912 ~ 1/3c008912
conv1: -1/3c88edfc ~ 1/3c88edfc
pool1: -1/3c88edfc ~ 1/3c88edfc

thus, the ranges are too tiny, am I right? or, could you give me the correct ranges?

  1. My code is the same one as https://github.com/NVIDIA/TensorRT/blob/release/6.0/samples/opensource/sampleINT8/sampleINT8.cpp
    and input dataset same as you said, but the processing images always 12800.

Hi,

  1. Considering HEX value as big endian:
    -----------------scaleFactor maxRange
    data: 3c008912 0.00784518 -> 127.5
    conv1: 3c88edfc 0.0167150423 -> 59.8
    pool1: 3c88edfc 0.0167150423 -> 59.8

I think the value is reasonable.

  1. Could you please share the output logs for further analysis so we can better help?

Thanks

Hi SunilJB,
The output log:

root@76605d82cb37:/opt/tensorrt/bin# ./sample_int8 mnist
&&&& RUNNING TensorRT.sample_int8 # ./sample_int8 mnist
[11/10/2019-09:10:11] [I] Building and running a GPU inference engine for INT8 sample
[11/10/2019-09:10:11] [I] FP32 run:400 batches of size 32 starting at 100
[11/10/2019-09:10:14] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[11/10/2019-09:10:16] [I] Processing next set of max 100 batches
[11/10/2019-09:10:16] [I] Processing next set of max 100 batches
[11/10/2019-09:10:16] [I] Processing next set of max 100 batches
[11/10/2019-09:10:16] [I] Top1: 0.748672, Top5: 0.75
[11/10/2019-09:10:16] [I] Processing 12800 images averaged 0.00314623 ms/image and 0.100679 ms/batch.
[11/10/2019-09:10:16] [I] FP16 run:400 batches of size 32 starting at 100
[11/10/2019-09:10:18] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[11/10/2019-09:10:19] [I] Processing next set of max 100 batches
[11/10/2019-09:10:20] [I] Processing next set of max 100 batches
[11/10/2019-09:10:20] [I] Processing next set of max 100 batches
[11/10/2019-09:10:20] [I] Top1: 0.748672, Top5: 0.75
[11/10/2019-09:10:20] [I] Processing 12800 images averaged 0.00279581 ms/image and 0.089466 ms/batch.
[11/10/2019-09:10:20] [I] INT8 run:400 batches of size 32 starting at 100
[11/10/2019-09:10:21] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[11/10/2019-09:10:21] [I] [TRT] Starting Calibration with batch size 50.
[11/10/2019-09:10:22] [I] [TRT]   Calibrated batch 0 in 0.0555084 seconds.
[11/10/2019-09:10:22] [I] [TRT]   Calibrated batch 1 in 0.0510375 seconds.
[11/10/2019-09:10:22] [I] [TRT]   Calibrated batch 2 in 0.050417 seconds.
[11/10/2019-09:10:22] [I] [TRT]   Calibrated batch 3 in 0.0501982 seconds.
[11/10/2019-09:10:22] [I] [TRT]   Calibrated batch 4 in 0.0520392 seconds.
[11/10/2019-09:10:22] [I] [TRT]   Calibrated batch 5 in 0.0554661 seconds.
[11/10/2019-09:10:22] [I] [TRT]   Calibrated batch 6 in 0.0553647 seconds.
[11/10/2019-09:10:22] [I] [TRT]   Calibrated batch 7 in 0.0514464 seconds.
[11/10/2019-09:10:22] [I] [TRT]   Calibrated batch 8 in 0.0501971 seconds.
[11/10/2019-09:10:22] [I] [TRT]   Calibrated batch 9 in 0.0502651 seconds.
[11/10/2019-09:10:22] [I] [TRT]   Post Processing Calibration data in 0.192569 seconds.
[11/10/2019-09:10:22] [I] [TRT] Calibration completed in 0.965675 seconds.
[11/10/2019-09:10:22] [I] [TRT] Writing Calibration Cache for calibrator: TRT-6001-EntropyCalibration2
[11/10/2019-09:10:24] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[11/10/2019-09:10:25] [I] Processing next set of max 100 batches
[11/10/2019-09:10:25] [I] Processing next set of max 100 batches
[11/10/2019-09:10:25] [I] Processing next set of max 100 batches
[11/10/2019-09:10:25] [I] Top1: 0.748359, Top5: 0.75
[11/10/2019-09:10:25] [I] Processing 12800 images averaged 0.0028091 ms/image and 0.0898911 ms/batch.
&&&& PASSED TensorRT.sample_int8 # ./sample_int8 mnist

Hi,

The difference in the processed image count seems to be due to batch size.
Current code is using batchsize of 32. Hence processed image count is 12800 = 40032.
The sample output in Readme.md was using batchsize = 100 which resulted in 400
100 = 40000 images.

Thanks

Hi SunilJB,
Thank you very much, and Merry Christmas!!

If I change batch size to 100, the processed images is 40000 but the precise still Top1: 0.749275, Top5: 0.75, it is far less than the value in README.md, can it be improved ?
another different is my execution only 3 iterations, but README.md has 4 iterations, does it means the original input dataset is much more than current?

[11/25/2019-06:41:10] [I] Building and running a GPU inference engine for INT8 sample
[11/25/2019-06:41:10] [I] FP32 run:400 batches of size 100 starting at 100
[11/25/2019-06:41:13] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[11/25/2019-06:41:15] [I] Processing next set of max 100 batches
[11/25/2019-06:41:15] [I] Processing next set of max 100 batches
[11/25/2019-06:41:15] [I] Processing next set of max 100 batches
[11/25/2019-06:41:15] [I] Top1: 0.74935, Top5: 0.75
[11/25/2019-06:41:15] [I] Processing 40000 images averaged 0.00191689 ms/image and 0.191689 ms/batch.
[11/25/2019-06:41:15] [I] FP16 run:400 batches of size 100 starting at 100
[11/25/2019-06:41:17] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[11/25/2019-06:41:19] [I] Processing next set of max 100 batches
[11/25/2019-06:41:19] [I] Processing next set of max 100 batches
[11/25/2019-06:41:19] [I] Processing next set of max 100 batches
[11/25/2019-06:41:19] [I] Top1: 0.74935, Top5: 0.75
[11/25/2019-06:41:19] [I] Processing 40000 images averaged 0.00109781 ms/image and 0.109781 ms/batch.
[11/25/2019-06:41:19] [I] INT8 run:400 batches of size 100 starting at 100
[11/25/2019-06:41:21] [I] [TRT] Reading Calibration Cache for calibrator: EntropyCalibration2
[11/25/2019-06:41:21] [I] [TRT] Generated calibration scales using calibration cache. Make sure that calibration cache has latest scales.
[11/25/2019-06:41:21] [I] [TRT] To regenerate calibration cache, please delete the existing one. TensorRT will generate a new calibration cache.
[11/25/2019-06:41:22] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[11/25/2019-06:41:24] [I] Processing next set of max 100 batches
[11/25/2019-06:41:24] [I] Processing next set of max 100 batches
[11/25/2019-06:41:24] [I] Processing next set of max 100 batches
[11/25/2019-06:41:24] [I] Top1: 0.749275, Top5: 0.75
[11/25/2019-06:41:24] [I] Processing 40000 images averaged 0.000912624 ms/image and 0.0912624 ms/batch.
&&&& PASSED TensorRT.sample_int8 # sample_int8 mnist

Hi,

The sample code was update in TRT 6 but it seems the README.md output snapshot was not updated based on the changes which seems to be causing the confusion in your case. (Batchsize, no of processed images)
The TRT model optimization vary based on the GPU type. The idea of this sample is just to showcase that INT8 optimization just has minor variation in accuracy compared to the FP32 model.

Please refer to below link for best practices to optimize the performance:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt-700/tensorrt-best-practices/index.html

Thanks

Hi SunilJB,

I understood INT8 optimization has minor variation in accuracy compared to the FP32 model, but if the accuracy could be approached to 0.94 is much better.

Thanks again!!

Hi,

Could you please also check the performance of actual model on your GPU system before optimizing using TRT?

Thanks