TensorRT used lots of memory when loading model files

zacklocx · April 27, 2023, 6:54am

I’m using a Orin NX 16G device, with Ubuntu 20.04 and TensorRT 8.5.2.
I noticed that the memory usage are quite high when loading 3 TRT model files.

Here the memory includes host memory and device memory.
I use “top” command to watch the host memory, and use jtop to watch the device memory, they both have a high value (Host: 2.3G, Device: 1.2G), and even more higher on some other test devices(Host: 3.4G, Device: 1.8G).

The TRT model files are quite small, they are all less than 5M, and I tried to convert the FP16 and INT8 variants but the memory usage still keep the same level.

My code is quite simple, just a createInferRuntime and then deserializeCudaEngine.

I don’t quite understand why the memory usage is so high, and is there any thing I can do to reduce the memory(both Host and Device) usage.

Thanks a lot.

AastaLLL · April 27, 2023, 7:59am

Hi,

Usually, the memory is used for loading the required memory. Ex, cuDNN.
You can try to infer the model without cuDNN to save the memory.

More information can be found in the below document:
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#memory-runtime-phase

Our trtexec also supports the feature:

$ /usr/src/tensorrt/bin/trtexec --tacticSources=-cudnn -...

Thanks.

zacklocx · April 27, 2023, 8:41am

Hi, I tried your solution, but it seems not work. The memory usage has no change at all.

And I tested another case, I run process A, after a while, I run A again, two processes run at the same time, I watch the two processes’ memory usage.
They use almost the same memory(host and device).

It seems that the cudnn or other tensor libraries are not shared between processes, in my test cases, the first A process should already loaded all the necessary libraries, second A process should not load these libraries again, but seems it’ not true.

Any thing I did wrong or there is other method I can try ?

Thanks a lot.

AastaLLL · May 3, 2023, 4:51am

Hi,

Have you tried to run the models in the same process?
Since Jetson cannot share the CUDA context between processes, each process needs to load the library on its own.

Thanks.

zacklocx · May 4, 2023, 11:50am

If I infer the model without cuDNN and cuBLAS, will the infer speed be slower than before ?
I can NOT accept the result if the infer speed is too slow.

Thanks.

AastaLLL · May 9, 2023, 8:18am

Hi,

The answer depends on the layers you used in the model.
It’s recommended to evaluate it with trtexec directly.

Thanks.

system · May 31, 2023, 2:01am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Expected Tensor RT 8 RAM Usage Jetson TX2 tensorrt	2	518	March 2, 2022
Lowering tensorrt memory usage Jetson TX2 tensorrt	4	576	May 16, 2023
High RAM consumption with CUDA and TensorRT on Jetson Xavier NX Jetson Xavier NX tensorrt	10	2771	October 18, 2021
Optimizing memory consumption on Jetson Jetson AGX Xavier jetson-inference	10	1071	October 18, 2021
Very large CPU RAM Usage in TensorRT General	7	6105	October 12, 2021
Excessive RAM usage Jetson Xavier NX pytorch , docker-machine-learning	4	823	February 12, 2024
Orin NX tensorflow - high memory use Jetson Orin NX tensorflow	3	403	January 3, 2024
Why tensorRT occupy many memory ? Jetson TX2	9	3812	May 12, 2021
Memory Usage Discrepancy with TensorRT 8.6 and 8.2 Jetson TX2 tensorrt	3	334	March 27, 2024
cuBLAS, cuDNN, and TensorRT memory release on Jetson nano Jetson Nano tensorrt , cuda , jetson-inference	5	1492	November 24, 2021

TensorRT used lots of memory when loading model files

Related topics