Hi,
I have started exploring TensorRT few months back, I have optimized custom trained keras classification model using TF-TRT python framework for both FP32 and FP16 precision modes. After conversion, model summary shows dtype as ‘float32’ for both Am I missing something? or all the layers cannot be optimized to FP16.
PFA for your reference.
Also as per this document : Accelerating Inference in TensorFlow with TensorRT User Guide - NVIDIA Docs - TensorFlow memory configurations can be used for TensorRT Inferencing, even after using those configurations there is no improvement. what could be the reason?
I even tried reducing workspace bytes while conversion, but there is no memory optimization.
The calculation is in FP16 format but the input/output keeps FP32 since this is easy for users to feed or read the data.
A format convert layer will be added by default to handle the conversion.