This is a known issue for inference on Jetson.
The reason is that these frameworks usually require the cuDNN library.
But it takes 600~800MiB for loading the library.
In TensorRT v8.0, we provide another alternative to run the model with cuBLAS instead.
Could you wait for our next TensorRT package? It’s included in our JetPack 4.6 release: