Hello all,
here are some observations I have made in working with the TensorRT platform.
I was able to convert several Tensorflow models both to pure TensorRT and Tensorflow-TRT.
I observed a more than double RAM usage at inference on the TFTRT version of the same model.
E.g. ssd_inception_v2 from the model zoo needs around 500MB to run inference in pure TensorRT form, while the TFTRT version requires more than 1.2GB.
I understand that this might be related to tensorflow-internal overhead but is such a large difference expected?
When running on embedded devices RAM usage can be a critical performance metric so I am looking forward to feedback.