TensorRT 5 High System Memory Usage


I’d like to understand system memory usage of TensorRT running on Xavier. I am using the trtexec utility to profile memory usage.

I am running the following command:

/usr/src/tensorrt/bin/trtexec --uff=./my-graph.uff --useDLACore=0 --fp16 --allowGPUFallback --avgRuns=1000 --iterations=1000

Simultaneously I am running the following to periodically monitor memory usage:

watch sudo smem -rp

I am seeing the following usage:

PID User     Command                         Swap      USS      PSS      RSS
22369 root     /usr/src/tensorrt/bin/trtex    0.00%    9.57%   10.76%   12.56%

That is 12.56% or just over 2GB of physical memory. Moreover, it is slowly increasing the longer trtexec is running as if memory is leaking.

Can someone explain if this level of memory usage is necessary/expected, and if I am in fact observing some kind of memory leak when I see the percentage slowly creeping up?


Can you share your uff? I’m using

/usr/src/tensorrt/bin/trtexec --uff=mnist/lenet5.uff  --output=Binary_3 --uffInput=Input_0,1,28,28 --iterations=100000000

and memory usage stays around 12% consistently.

Hi. Thank you for your response.

So there are two issues here:

  1. Why is memory usage 12%?
  2. Why is memory usage slowly creeping up?

You mentioned you don’t observe issue #2, so let’s focus on issue #1.

Is that amount of physical memory usage then expected? 12% of 16G = 1.92GB. In my application I am running two inferencing engines (thus the total memory usage is a whopping 25% of the physical memory available). I plan on adding more. It won’t scale well if each instance utilizes 2GB.

Why is that much memory needed? Thanks!

Hello, Can I please get an answer? Is it normal to expect 2GB memory usage with one TensorRT engine loaded? Is there any way to bring the memory usage down?


Per engineering, high memory usage, especially on TX1 and TX2, is a known issue. It’s something we’re working on. For networks that are relatively small it can sometimes take 1 or 2 GB of shared memory.

The general recommendation is to stop gdm/lightdm to free up the display memory if you need to run larger batch sizes.

Thank you for the response.

My inquiry is related to Xavier not TX1 or TX2. Does the same explanation apply to Xavier as well?

Is there a rough timeline for fixing this issue? Is it expected to be part of the next TensorRT release?


We are facing a similar problem on TX1 & TX2, is there any temporary solution or anything we can do?

We looked into cuDNN and there seems to be a few flags that you can set to reduce memory usage, is there something similar on TensorRT?