TensorRT inference time much faster than cuDNN

redradist · February 13, 2022, 1:52pm

Description

Recently I’ve created small network on cuDNN (2-3 convolution layers) and the same on TensorRT and it looks like TensorRT is faster than cuDNN in 1.8-1.9 times.

For that I have a question, does TensorRT perform implicit conversion of the model to FP16 if it was provided as F32 ? For example, it looks like that TensorRT evaluates model and if difference is not big, it converts model to FP16, in such way improve performance …

Is that a case ?

Environment

TensorRT Version: 8
GPU Type: GTX1080
Nvidia Driver Version: 465
CUDA Version: 11.2
CUDNN Version: 8.1

spolisetty · February 14, 2022, 3:00pm

Yes, it converts to FP16 automatically.

redradist · February 14, 2022, 3:21pm

@spolisetty
Does it mean that if I provide FP32 network, TensorRT could convert whole network to FP16 ?

Is it possible to avoid such behaviour ? For example to enforce working with FP32 precision ? Some option probably ?

redradist · February 14, 2022, 3:34pm

@spolisetty
Do you mean that happens if if Tensor Core available ??

GTX1080 do not have Tensor Core support, how TensorRT could convert network to FP16, if it will work in emulation mode and will work very slow ?

spolisetty · February 22, 2022, 4:06pm

Hi,

We do support FP16 even without tensor cores, although it will of course be much faster if tensor cores are available.
Yes, TRT could convert the whole network to FP16.
FP16 is opt-in - the default behavior is to use FP32 precision. Finer-grained control is also possible, i.e. the user can mark specific layers to run in FP32 or FP16

Thank you.

system · May 16, 2022, 9:01am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
TensorRT inference time much slower than cuDNN TensorRT	3	2011	October 12, 2021
TensorRT inference time extremely slow TensorRT	1	443	January 31, 2023
inference time of tensorrt is slower than tensorflow !!! TensorRT	2	1431	September 27, 2019
No performance improvement for Tensorflow TensorRT model on converted on Jetsons Xavier NX Jetson Xavier NX tensorrt , tensorflow	2	675	October 18, 2021
TensorRT int8 slower than FP16 due to reformat layer TensorRT tensorrt , cudnn	0	38	October 11, 2024
FP16 doesn't bring improvement to inference TensorRT	0	899	May 29, 2019
P6000 TensorRT too slow and the serialized fp16-model size is not as expected TensorRT tensorrt	1	435	April 4, 2023
TRT Uses INT 32 VS INT 16 TensorRT	3	967	October 12, 2021
Does tensor rt 5 automatically enable tensor core for int8 and fp16 mode? TensorRT	6	1800	April 26, 2019
inference speed not improve between FP32 vs FP16 when using tensorflow.contrib.tensorrt Jetson AGX Xavier	4	717	October 18, 2021

TensorRT inference time much faster than cuDNN

Description

Environment

Related topics