TensorRT used lots of memory when loading model files

I’m using a Orin NX 16G device, with Ubuntu 20.04 and TensorRT 8.5.2.
I noticed that the memory usage are quite high when loading 3 TRT model files.

Here the memory includes host memory and device memory.
I use “top” command to watch the host memory, and use jtop to watch the device memory, they both have a high value (Host: 2.3G, Device: 1.2G), and even more higher on some other test devices(Host: 3.4G, Device: 1.8G).

The TRT model files are quite small, they are all less than 5M, and I tried to convert the FP16 and INT8 variants but the memory usage still keep the same level.

My code is quite simple, just a createInferRuntime and then deserializeCudaEngine.

I don’t quite understand why the memory usage is so high, and is there any thing I can do to reduce the memory(both Host and Device) usage.

Thanks a lot.

Hi,

Usually, the memory is used for loading the required memory. Ex, cuDNN.
You can try to infer the model without cuDNN to save the memory.

More information can be found in the below document:
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#memory-runtime-phase

Our trtexec also supports the feature:

$ /usr/src/tensorrt/bin/trtexec --tacticSources=-cudnn -...

Thanks.

Hi, I tried your solution, but it seems not work. The memory usage has no change at all.

And I tested another case, I run process A, after a while, I run A again, two processes run at the same time, I watch the two processes’ memory usage.
They use almost the same memory(host and device).

It seems that the cudnn or other tensor libraries are not shared between processes, in my test cases, the first A process should already loaded all the necessary libraries, second A process should not load these libraries again, but seems it’ not true.

Any thing I did wrong or there is other method I can try ?

Thanks a lot.

Hi,

Have you tried to run the models in the same process?
Since Jetson cannot share the CUDA context between processes, each process needs to load the library on its own.

Thanks.

If I infer the model without cuDNN and cuBLAS, will the infer speed be slower than before ?
I can NOT accept the result if the infer speed is too slow.

Thanks.

Hi,

The answer depends on the layers you used in the model.
It’s recommended to evaluate it with trtexec directly.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.