Is it normal to use 3.5GB of RAM after doing inference with the TX2 when using TensorRT?

Hi @AastaLLL, @dusty_nv, @JerryChang

I am measuring the RAM used during inference using a classification model (Inception_v1), TensorRT and PyCUDA.

I have used a page-locked memory to allocate the tensors.

To get the Memory used, I have used /proc/meminfo. Also, I am reading the memory used after the predictions have been sent back from GPU to CPU (as you can see in the image).

The TX2 is using around 3.5GB of RAM. I made a plot of it.

  1. Why is that amount of memory being used?
  2. Could it be less? Maybe if I had used the Unified Memory?
  3. Is it related to the amount of RAM used while optimizing the .onnx file with “trtexec”? I used this command: /usr/src/tensorrt/bin/trtexec --onnx=inception_v1_2016_08_28_frozen.onnx --saveEngine=inception_v1_2016_08_28_fp16.trt --workspace=4096 --fp16


image

image

Hi,

Please notes that it takes around 600up MiB memory to load the TensorRT library.
And the buffer for input/output and the inference weights also requires memory.

It’s possible to limit the inference workspace memory used to store the intermediate data.
This will limit TensorRT to select the inference algorithm that can fit into the given memory limitation.

You can do this by adjusting the workspace value directly.

Thanks.

1 Like

Hi @AastaLLL

And the buffer for input/output and the inference weights also requires memory.
Can you explain me this? If I had used the Unified Memory will there still be memory requirement for the input/output buffers?

Hi,

Based on your sample, there is a memory copy step to copy h_input into d_input.
d_input and d_output should be the buffer prepared for TensorRT inference?

Thanks.

1 Like

Yes, d_input and d_output are the GPU.

In cuda.memcpy_htod_async(d_input, h_input, stream) I am copying the image. I left it outside of the loop to make the copy once.

In cuda.memcpy_dtoh_async(h_output, d_output, stream) I am copying the predictions from the GPU to the CPU.

Also, with /proc/meminfo I get something like this:

To get the memory usage I just do: MemTotal - MemFree.

When using d_input and d_output, am I using the DRAM or the RAM inside the GPU?

Hi,

On Jetson, Both CPU and GPU share the same SoC DRAM memory.
You can find information about the memory design and usage in this document:

Thanks.