Hi,
Could you run your model with trtexec to test the memory required for TensorRT?
/usr/src/tensorrt/bin/trtexec --onnx=[your/model] #fp32
/usr/src/tensorrt/bin/trtexec --onnx=[your/model] --fp16 #fp16
To inference with cuDNN (TensorRT), it requires at least 600 MB of memory for loading the library.
If you run it with ONNX Runtime, you need more memory to load the all required library.
Is pure TensorRT API an option for you?
Thanks.