TRT Uses INT 32 VS INT 16

Description

Hi all,
I was testing the comparison of INT 32 and INT 16. However, the results are quite close. Even INT 16 got the worse performance.

My TRT engine was generated by onnx2trt.

Command:

For INT 32

onnx2trt model.onnx -o model.trt -b 1

Then implement TRT and output:

TensorRT inference engine settings:
  * Inference precision - DataType.FLOAT
  * Max batch size - 1

For INT16

onnx2trt model.onnx -o model.trt -b 1 -d 16

Then implement TRT and output :

Fetch TensorRT engine path and datatype. Use INT 16
TensorRT inference engine settings:
  * Inference precision - HALF
  * Max batch size - 1

I tested the whole steps of taking time that it is quite close…
Would anyone like to share some experience on testing different precisions?

Thank you so much!

Environment

TensorRT Version: 7.0.0.11
GPU Type: 1060
Nvidia Driver Version: 440
CUDA Version: 10.0
CUDNN Version: 7.6.5
Operating System + Version: Ubuntu: 18.04
Python Version (if applicable): Python 3.6.9
PyTorch Version (if applicable): 1.4.0

I think by INT32 and INT16 you meant FP32 and FP16.

GPU GeForce GTX 1060 has compute capability of 6.1 and 6.1 CUDA compute capability GPU doesn’t have FP16 support.
Please refer to below link:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt-700/tensorrt-support-matrix/index.html#hardware-precision-matrix

Thanks

Hi @SunilJB,

Thanks for your information!
It is helpful.