The tensorrt libraries(libnvinfer) and the linked-to cudnn libraries are very big. Together roughly 650 MB. This takes up already a lot of the memory. I have tried linking with the static versions, and then it’s just my own program that gets to be 650 MB big. So is there something that can be done about it?
TensorRT Version: 6
GPU Type: nvidia jetson nano
Nvidia Driver Version: jetpack 4.3
CUDA Version: jetpack 4.3 (10.0)
CUDNN Version: jetpack 4.3
Operating System + Version: custom OS based on yocto + meta-tegra layer
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): nvidia jetson nano
Steps To Reproduce
Link and use a C++ program with tensorrt’s libnvinfer library.
Moving to Jetson Nano forum so that Jetson team can take a look.
Hi @deblauwetom, the latest JetPack 4.4 Developer Preview release with cuDNN 8.0 splits up the cuDNN libraries into several sub-libraries which are smaller in size:
$ ls -ll /usr/lib/aarch64-linux-gnu/libcudnn*.so.8.0.0
-rw-r--r-- 1 root root 98767336 Apr 18 04:02 /usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so.8.0.0
-rw-r--r-- 1 root root 51032536 Apr 18 04:02 /usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so.8.0.0
-rw-r--r-- 1 root root 177698176 Apr 18 04:02 /usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so.8.0.0
-rw-r--r-- 1 root root 31817240 Apr 18 04:02 /usr/lib/aarch64-linux-gnu/libcudnn_cnn_train.so.8.0.0
-rw-r--r-- 1 root root 138606184 Apr 18 04:02 /usr/lib/aarch64-linux-gnu/libcudnn_etc.so.8.0.0
-rw-r--r-- 1 root root 108440120 Apr 18 04:02 /usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8.0.0
-rw-r--r-- 1 root root 27284344 Apr 18 04:02 /usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so.8.0.0
-rw-r--r-- 1 root root 149480 Apr 18 04:02 /usr/lib/aarch64-linux-gnu/libcudnn.so.8.0.0`
libcudnn.so then dynamically loads only the needed cuDNN sub-libraries at runtime based on what API calls the application is performing. This should significantly reduce the memory usage from the libraries (e.g. if only inferencing is used).
That is good news indeed. However, it seems the “infer” libraries are still the biggest in your list, so I won’t get my hopes up too much.
For example when loading an engine with deserializeCudaEngine I get 800MB of memory usage extra. The saved network itself is only 6MB. From the moment I use this function once, I get the extra memory usage. Would this case be solved with jetpack 4.4? Are there any memory usage benchmarks available? I would just like know if upgrading to jetpack 4.4 would be worth the trouble before I try it.