TRT Uses INT 32 VS INT 16

Chieh · April 29, 2020, 6:25am

Description

Hi all,
I was testing the comparison of INT 32 and INT 16. However, the results are quite close. Even INT 16 got the worse performance.

My TRT engine was generated by onnx2trt.

Command:

For INT 32

onnx2trt model.onnx -o model.trt -b 1

Then implement TRT and output:

TensorRT inference engine settings:
  * Inference precision - DataType.FLOAT
  * Max batch size - 1

For INT16

onnx2trt model.onnx -o model.trt -b 1 -d 16

Then implement TRT and output :

Fetch TensorRT engine path and datatype. Use INT 16
TensorRT inference engine settings:
  * Inference precision - HALF
  * Max batch size - 1

I tested the whole steps of taking time that it is quite close…
Would anyone like to share some experience on testing different precisions?

Thank you so much!

Environment

TensorRT Version: 7.0.0.11
GPU Type: 1060
Nvidia Driver Version: 440
CUDA Version: 10.0
CUDNN Version: 7.6.5
Operating System + Version: Ubuntu: 18.04
Python Version (if applicable): Python 3.6.9
PyTorch Version (if applicable): 1.4.0

SunilJB · April 29, 2020, 9:41am

I think by INT32 and INT16 you meant FP32 and FP16.

github.com

onnx/onnx-tensorrt/blob/fe69596eaa66288a11a83c6095252427ea781e5b/main.cpp#L51


      
          
          
void print_usage() {
            cout << "ONNX to TensorRT model parser" << endl;
            cout << "Usage: onnx2trt onnx_model.pb" << "\n"
                 << "                [-o engine_file.trt]  (output TensorRT engine)" << "\n"
                 << "                [-t onnx_model.pbtxt] (output ONNX text file without weights)" << "\n"
                 << "                [-T onnx_model.pbtxt] (output ONNX text file with weights)" << "\n"
                 << "                [-m onnx_model_out.pb] (output ONNX model)" << "\n"
                 << "                [-b max_batch_size (default 32)]" << "\n"
                 << "                [-w max_workspace_size_bytes (default 1 GiB)]" << "\n"
                 << "                [-d model_data_type_bit_depth] (32 => float32, 16 => float16)" << "\n"
                 << "                [-O passes] (optimize onnx model. Argument is a semicolon-separated list of passes)" << "\n"
                 << "                [-p] (list available optimization passes and exit)" << "\n"
                 << "                [-l] (list layers and their shapes)" << "\n"
                 << "                [-g] (debug mode)" << "\n"
                 << "                [-F] (optimize onnx model in fixed mode)" << "\n"
                 << "                [-v] (increase verbosity)" << "\n"
                 << "                [-q] (decrease verbosity)" << "\n"
                 << "                [-V] (show version information)" << "\n"
                 << "                [-h] (show help)" << endl;
          }

GPU GeForce GTX 1060 has compute capability of 6.1 and 6.1 CUDA compute capability GPU doesn’t have FP16 support.
Please refer to below link:

Thanks

Chieh · April 29, 2020, 9:50am

Hi @SunilJB,

Thanks for your information!
It is helpful.

Topic		Replies	Views
Same inference speed for INT8 and FP16 TensorRT	10	5859	October 12, 2021
TRT Engin in INT8 is much slower than FP16 TensorRT	4	1965	November 11, 2021
Time of inference in FP16 and FP32 is the same Jetson TX2 tensorrt	20	1752	August 10, 2022
Int8 performance is less than fp16 TensorRT tensorrt	3	865	September 2, 2022
Different FP16 inference with tensorrt and pytorch TensorRT	5	4539	October 25, 2021
The inference speed of yolov5 tensorrt has little difference between int8 and fp16 TensorRT tensorrt , cuda	1	1523	September 8, 2022
Little performance difference between int8 and fp16 on RTX2080 TensorRT	4	2577	July 5, 2021
TensorRT int8 slower than FP16 due to reformat layer TensorRT tensorrt , cudnn	0	103	October 11, 2024
No performance difference between Float16 and Float32 optimized TensorRT models Jetson AGX Xavier tensorrt	4	3135	October 10, 2021
Same memory usage for fp16 and int8 Jetson Xavier NX tensorrt	4	2171	September 27, 2021