We are having issues with high memory consumption on Jetson Xavier NX especially when using TensorRT via ONNX RT.
By default our NN models are in FP32, so we tried converting to FP16 which makes the NN model smaller. However, during the model inference the memory consumption is the same as with FP32.
I did enable FP16 inference using ORT_TENSORRT_FP16_ENABLE=1 as suggested by ONNX Runtime documentation (TensorRT - onnxruntime). But it didn’t help.
Does Jetson Xavier NX support both FP16 and FP32, especially for CUDA and TensorRT?
Is there any other way to reduce the memory consumption when using CUDA and TensorRT?
We also found out that with any NN model the process using ONNX RT with CUDA execution provider utilizes at least 1.5GB of RAM and with TensorRT execution provider at least 2 GB of RAM. Is this expected?
Is there maybe any light version of the libraries to be used on a device with limited RAM like Jetson?
Thank you in advance