I have a TensorRT engine of a network optimized to FP16 precision. The original network from which I built the engine has weights represented as FP32. I now want to feedforward the engine with an input tensor. I know all the steps from allocating buffers to memcpying the output device buffers to the host buffers but I am missing some crucial pieces of information about inputting and outputting this TensorRT engine:
When I input a tensor to the engine is it implicitly converted to FP16 in the engine before feedforward or I need to manually convert the input tensor from FP32 to FP16?
When I extract the output the from the engine is it implicitly converted back to FP32 or I need to convert it back to FP32 manually?
Thanks for your suggestion. I currently have an earlier version of TRT (either 4 or 5, not sure) on my Jetson Nano but am not sure about updating it to a later version mainly because of the need of backward compatibility to several applications I have. Can updating to TRT 6 break compatibility to applications relying on previous versions of TRT?