Hi,
The memory is used for loading the cuDNN/cuBLAS library.
If you are using TensorRT 8.0 (JetPack 4.6), an alternative is to inference the model without using cuDNN.
For example:
$ /usr/src/tensorrt/bin/trtexec --deploy=mnist.prototxt --model=mnist.caffemodel --output=prob --tacticSources=-cudnn --verbose
Thanks.