Analyzing sampleInt8 accuracy

Hello All,

Hardware: Jetson AGX Xavier
Jetpack: 4.2
TensorRT: 5.0.6

Could anyone help me clear some doubts regarding sampleInt8 application found in TensorRT’s sample directory? Without any modification to the code, I tried to run the basic example:

$./sample_int8 mnist  

FP32 run:400 batches of size 100 starting at 100
........................................
Top1: 0.9904, Top5: 1
Processing 40000 images averaged 0.0313064 ms/image and 3.13064 ms/batch.

FP16 run:400 batches of size 100 starting at 100
........................................
Top1: 0.9904, Top5: 1
Processing 40000 images averaged 0.0219892 ms/image and 2.19892 ms/batch.

INT8 run:400 batches of size 100 starting at 100
........................................
Top1: 0.9909, Top5: 1
Processing 40000 images averaged 0.0155735 ms/image and 1.55735 ms/batch.

Observations:
- Performance improves while reducing precision as expected
- Accuracy improves unexpectedly

More surprisingly, when using DLA cores:

$./sample_int8 mnist  useDLACore=0

DLA requested. Disabling for FP32 run since its not supported.

FP32 run:400 batches of size 100 starting at 100
........................................
Top1: 0.9904, Top5: 1
Processing 40000 images averaged 0.0314237 ms/image and 3.14237 ms/batch.

FP16 run:400 batches of size 100 starting at 100
Requested batch size 100 is greater than the max DLA batch size of 32. Reducing batch size accordingly.
WARNING: Default DLA is enabled but layer prob is not running on DLA, falling back to GPU.
........................................
Top1: 0.932578, Top5: 0.966406
Processing 12800 images averaged 0.195462 ms/image and 6.25477 ms/batch.

DLA requested. Disabling for Int8 run since its not supported.

INT8 run:400 batches of size 100 starting at 100
........................................
Top1: 0.9908, Top5: 1
Processing 40000 images averaged 0.0174113 ms/image and 1.74113 ms/batch.

Observations:
- Accuracy and performance results for fp16 and int8 look similar to the first experiment.
- For fp16 which uses DLA, the accuracy is lower comparing to previous experiment.

[b]So my question are:

  1. Why does using int8 precision improves accuracy
  2. Why does using DLA degrades accuracy
    [/b]
    Thanks,

Any updates on this?

We are very interested in int8 case for our publication.

The slight accuracy improvement on the INT8 model might be explained as: The FP32/FP16 models somewhat overfit the training data (overfitting to certain noise patterns in the MNIST training images). Maybe the INT8 quantization helps to reduce that overfitting, and thus the INT8 model generalizes better to unseen (test) images.