Hello All,
Hardware: Jetson AGX Xavier
Jetpack: 4.2
TensorRT: 5.0.6
Could anyone help me clear some doubts regarding sampleInt8 application found in TensorRT’s sample directory? Without any modification to the code, I tried to run the basic example:
$./sample_int8 mnist
FP32 run:400 batches of size 100 starting at 100
........................................
Top1: 0.9904, Top5: 1
Processing 40000 images averaged 0.0313064 ms/image and 3.13064 ms/batch.
FP16 run:400 batches of size 100 starting at 100
........................................
Top1: 0.9904, Top5: 1
Processing 40000 images averaged 0.0219892 ms/image and 2.19892 ms/batch.
INT8 run:400 batches of size 100 starting at 100
........................................
Top1: 0.9909, Top5: 1
Processing 40000 images averaged 0.0155735 ms/image and 1.55735 ms/batch.
Observations:
- Performance improves while reducing precision as expected
- Accuracy improves unexpectedly
More surprisingly, when using DLA cores:
$./sample_int8 mnist useDLACore=0
DLA requested. Disabling for FP32 run since its not supported.
FP32 run:400 batches of size 100 starting at 100
........................................
Top1: 0.9904, Top5: 1
Processing 40000 images averaged 0.0314237 ms/image and 3.14237 ms/batch.
FP16 run:400 batches of size 100 starting at 100
Requested batch size 100 is greater than the max DLA batch size of 32. Reducing batch size accordingly.
WARNING: Default DLA is enabled but layer prob is not running on DLA, falling back to GPU.
........................................
Top1: 0.932578, Top5: 0.966406
Processing 12800 images averaged 0.195462 ms/image and 6.25477 ms/batch.
DLA requested. Disabling for Int8 run since its not supported.
INT8 run:400 batches of size 100 starting at 100
........................................
Top1: 0.9908, Top5: 1
Processing 40000 images averaged 0.0174113 ms/image and 1.74113 ms/batch.
Observations:
- Accuracy and performance results for fp16 and int8 look similar to the first experiment.
- For fp16 which uses DLA, the accuracy is lower comparing to previous experiment.
[b]So my question are:
- Why does using int8 precision improves accuracy
- Why does using DLA degrades accuracy
[/b]
Thanks,