cuBLAS, cuDNN, and TensorRT memory release on Jetson nano

I’m using TensorRT inference on jetson nano 2GB board, only the device memory allocated by the TensorRT allocator will be released by calling .destroy() . cuDNN, cuBLAS memory would not release. I found some similar topics: CUDA memory release, GPU memory may leak during deserializing the engine on TensorRT 6, but it looks like can not release before the application is terminated.

Due to we need to call the application continuously and 2GB memory is so limited, so how to release cuDNN, cuBLAS memory without terminating the application?

Hi,

The memory is used for loading the cuDNN/cuBLAS library.
If you are using TensorRT 8.0 (JetPack 4.6), an alternative is to inference the model without using cuDNN.

For example:

$ /usr/src/tensorrt/bin/trtexec --deploy=mnist.prototxt --model=mnist.caffemodel --output=prob --tacticSources=-cudnn --verbose

Thanks.

Thanks for your reply!
It works but seems like not an elegant way for continuous inference

small question: I convert .onnx model to .trt model using trtexec with --tacticSources=-CUDNN,-CUBLAS, I found inference time looks not increased and the result is correct. Are CUDNN and CUBLAS necessary for inference? any difference?

Hi,

It depends on the layer you used.

Below is the explanation from our document:
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#memory-runtime-phase

TensorRT’s dependencies (cuDNN and cuBLAS) can occupy large amounts of device memory. TensorRT allows you to control whether these libraries are used for inference via the TacticSources (C++, Python) attribute in the builder configuration. Note that some operator implementations require these libraries, so that when they’re excluded, the network may not compile.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.