TensorRT Inferencing using TF-TRT framework FP32 vs FP16

archana.nageshnagashree · May 27, 2024, 4:13pm

Hi,
I have started exploring TensorRT few months back, I have optimized custom trained keras classification model using TF-TRT python framework for both FP32 and FP16 precision modes. After conversion, model summary shows dtype as ‘float32’ for both Am I missing something? or all the layers cannot be optimized to FP16.
PFA for your reference.

Also as per this document : Accelerating Inference in TensorFlow with TensorRT User Guide - NVIDIA Docs - TensorFlow memory configurations can be used for TensorRT Inferencing, even after using those configurations there is no improvement. what could be the reason?
I even tried reducing workspace bytes while conversion, but there is no memory optimization.

AastaLLL · May 28, 2024, 9:31am

Hi,

The calculation is in FP16 format but the input/output keeps FP32 since this is easy for users to feed or read the data.
A format convert layer will be added by default to handle the conversion.

Thanks.

archana.nageshnagashree · May 28, 2024, 9:57am

okay, Thanks for the quick reply.
I still do not see any performance improvement, will check for few trials and get back to you if required.

AastaLLL · May 30, 2024, 5:25am

Hi,

It looks like the accelerated ratio of fp32 and fp16 seems to be identical.

Not sure if there are any dependencies or overhead in the TF-TRT.
Are you able to use pure TensorRT (like trtexec) so we can give it a further check?

Thanks.

archana.nageshnagashree · May 30, 2024, 9:12am

Hi,
I have not explored trtexec as was getting proper results and improvement with FP32 optimized models, will explore trtexec once.

AastaLLL · June 3, 2024, 6:24am

Hi,

Please give it a try and let us know the following.
Thanks.

system · July 2, 2024, 1:26pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
inference speed not improve between FP32 vs FP16 when using tensorflow.contrib.tensorrt Jetson AGX Xavier	4	782	October 18, 2021
Time of inference in FP16 and FP32 is the same Jetson TX2 tensorrt	20	2004	August 10, 2022
TRT inference fp32 vs fp16 TensorRT	4	2812	June 17, 2020
TensorRT inference time much faster than cuDNN TensorRT	5	1805	February 22, 2022
Different FP16 inference with tensorrt and pytorch TensorRT	5	4628	October 25, 2021
FP16 not even two times faster than using FP32 in TensorRT TensorRT	0	686	June 12, 2019
Inference using FP16 and FP32 precision giving no performance gain on Jetson Nano Jetson Nano	2	1419	October 14, 2021
No performance improvement for Tensorflow TensorRT model on converted on Jetsons Xavier NX Jetson Xavier NX tensorrt , tensorflow	2	734	October 18, 2021
Trtexec --fp16 TensorRT tensorrt	2	2438	October 15, 2022
FP16 doesn't bring improvement to inference TensorRT	0	960	May 29, 2019

TensorRT Inferencing using TF-TRT framework FP32 vs FP16

Related topics