Hi,
My environments are Xeon(R) CPU E5-2620 v4 + Nvidia T4,
sudo nvidia-docker run … tensorrt:19.10-py3 bash and
sudo docker attach this container-id.
I have make and run ./sample_int8 mnist successfully, but there are several issues confuse me:
Why it only has processed 12800 images rather than 40000 images which exposed in /opt/tensorrt/samples/sampleINT8/README.md?
(the input dataset are train-images-idx3-ubyte and train-labels-idx1-ubyte)
How to explain the value of each “layer name : value” pairs within CalibrationTablemnist?
How to produce the executable engine after executing sample_int8?
Regarding layer name : value" pairs within CalibrationTablemnist:
: value corresponds to the floating point activation scales determined during calibration for each tensor in the network.
Hi SunilJB,
Thanks for your reply!!
: value corresponds to the floating point activation scales determined during calibration for each tensor in the network.
These values are so huge and much great than 127(the max value of int8), how the values can be calibrated for int8? does it should be x= abs(value) / 127; therefor, x is real activation scales for int8 ?
Why all these values only positive but no negative?
My input dataset train-images-idx3-ubyte and train-labels-idx1-ubyte are download form the link you described, but the total images parsing by sample_int8 are 12800 not 40000, does there is another dataset can be download?
1.If the dynamic range is the reciprocal of scale, I give the result of CalibrationTableMnist:
TRT-6001-EntropyCalibration2
data: 3c008912
conv1: 3c88edfc
pool1: 3c88edfc
their dynamic range should be:
data: -1/3c008912 ~ 1/3c008912
conv1: -1/3c88edfc ~ 1/3c88edfc
pool1: -1/3c88edfc ~ 1/3c88edfc
thus, the ranges are too tiny, am I right? or, could you give me the correct ranges?
The difference in the processed image count seems to be due to batch size.
Current code is using batchsize of 32. Hence processed image count is 12800 = 40032.
The sample output in Readme.md was using batchsize = 100 which resulted in 400100 = 40000 images.
Hi SunilJB,
Thank you very much, and Merry Christmas!!
If I change batch size to 100, the processed images is 40000 but the precise still Top1: 0.749275, Top5: 0.75, it is far less than the value in README.md, can it be improved ?
another different is my execution only 3 iterations, but README.md has 4 iterations, does it means the original input dataset is much more than current?
[11/25/2019-06:41:10] [I] Building and running a GPU inference engine for INT8 sample
[11/25/2019-06:41:10] [I] FP32 run:400 batches of size 100 starting at 100
[11/25/2019-06:41:13] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[11/25/2019-06:41:15] [I] Processing next set of max 100 batches
[11/25/2019-06:41:15] [I] Processing next set of max 100 batches
[11/25/2019-06:41:15] [I] Processing next set of max 100 batches
[11/25/2019-06:41:15] [I] Top1: 0.74935, Top5: 0.75
[11/25/2019-06:41:15] [I] Processing 40000 images averaged 0.00191689 ms/image and 0.191689 ms/batch.
[11/25/2019-06:41:15] [I] FP16 run:400 batches of size 100 starting at 100
[11/25/2019-06:41:17] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[11/25/2019-06:41:19] [I] Processing next set of max 100 batches
[11/25/2019-06:41:19] [I] Processing next set of max 100 batches
[11/25/2019-06:41:19] [I] Processing next set of max 100 batches
[11/25/2019-06:41:19] [I] Top1: 0.74935, Top5: 0.75
[11/25/2019-06:41:19] [I] Processing 40000 images averaged 0.00109781 ms/image and 0.109781 ms/batch.
[11/25/2019-06:41:19] [I] INT8 run:400 batches of size 100 starting at 100
[11/25/2019-06:41:21] [I] [TRT] Reading Calibration Cache for calibrator: EntropyCalibration2
[11/25/2019-06:41:21] [I] [TRT] Generated calibration scales using calibration cache. Make sure that calibration cache has latest scales.
[11/25/2019-06:41:21] [I] [TRT] To regenerate calibration cache, please delete the existing one. TensorRT will generate a new calibration cache.
[11/25/2019-06:41:22] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[11/25/2019-06:41:24] [I] Processing next set of max 100 batches
[11/25/2019-06:41:24] [I] Processing next set of max 100 batches
[11/25/2019-06:41:24] [I] Processing next set of max 100 batches
[11/25/2019-06:41:24] [I] Top1: 0.749275, Top5: 0.75
[11/25/2019-06:41:24] [I] Processing 40000 images averaged 0.000912624 ms/image and 0.0912624 ms/batch.
&&&& PASSED TensorRT.sample_int8 # sample_int8 mnist
The sample code was update in TRT 6 but it seems the README.md output snapshot was not updated based on the changes which seems to be causing the confusion in your case. (Batchsize, no of processed images)
The TRT model optimization vary based on the GPU type. The idea of this sample is just to showcase that INT8 optimization just has minor variation in accuracy compared to the FP32 model.
I understood INT8 optimization has minor variation in accuracy compared to the FP32 model, but if the accuracy could be approached to 0.94 is much better.
Hi SunilJB,
I didnt get how the scaling factors are computed from hex values, can you please explain how the mapping happens in data: 3c008912 0.00784518 → 127.5 and how to computer scaling factors from calib.cache file