TensorRT model consuming more amount of RAM on Jetson TX2

Hi Team,
We are using TensorRT ( .pb) model and Keras (.hdf5) model for object detection on the Nvidia Jetson TX2 board, and both of the model size is same, which is 167MB.
The issue which we are facing is, when we use the Keras model for detection, RAM, GPU, and CPU usages are normal, but if we use the TensorRT model for detection, usage of RAM is more, when compared with Keras model, GPU and CPU usages are normal.

Below is the output Tegrastats command
1.While running the Keras model
S/M RAM usage - 1686 / 7852 ( Used / Total)
CPU usage - 21.47 %
Max GPU usage - 64 %

2.While running the TensorRT model
S/M RAM usage - 5615 / 7852 ( Used / Total )
CPU usage - 18.49 %
Max GPU usage - 40 %

Kindly help us to fix/reduce the RAM usage when the TensorRT model running.
Below I am mentioning package versions which we used on TX2
Jetpack - 4.4
CUDA - 10.2
Tensorflow - 2.2.0
Keras - 2.4.3
Libcudnn - 8.0
Linux ubuntu - 18.04
TensorRT - 7.1.3

I can’t help with the details of the TensorRT, but the stats you are showing imply less CPU memory and GPU memory usage for TensorRT, but more total memory usage (meaning other processes may be consuming the memory). Maybe it isn’t the TensorRT which is consuming extra. If someone wants to look at this closer, then you might want to save a copy of “/proc/meminfo” for each model.

I’ll also recommend looking at memory usage in “htop” (if you don’t have this, then “sudo apt-get install htop”). Htop conveniently can sort by different values and show the top consumers of those values. The F6 key opens a left column, and then you can use the arrow keys to go up/down the list of sortable fields (followed by picking PERCENT_MEM). Note that virtual memory is somewhat pointless for this purpose and can be ignored. Probably more important is sorting by M_RESIDENT, the “resident” memory which will be consumed at all times as a minimum so long as the process runs.

Having a better idea where the memory is actually used would help whoever answers. You might also want to provide a line or two of output from tegrastats when memory usage is high. The stats you have above do not actually show TensorRT itself as using more memory.


Just want to confirm first.

How do you run the TensorRT inference?
Do you use the TensorRT API directly or launch it via TF-TRT?


Hi, We are running TRT inference using TF API ( TF version is 2.2.0)


It’s known that TF-TRT use much more memory since it might duplicate the pipeline (one for TRT and other for TF) to support fallback.
To optimized memory usage, it’s recommended to run your model with TensorRT API directly.

You can find an example here: