The same model consumes different sizes of GPU memory in different GPU


Same model same TensorRT with win10.
350MB GPU memory is required with GTX 1060 ,
700MB GPU memory is required with RTX 2070 ,
1GB GPU memory is required with RTX 3060.

That’s my guess, the architecture of GPU has a great impact on the consumption of GPU memory with TensorRT.?


TensorRT Version:
GPU Type: GTX 1060 RTX 2070 RTX 3060
Nvidia Driver Version: 456.71
CUDA Version: 11.1
CUDNN Version: 8.1
Operating System + Version: win10
1060_trtexec.txt (13.1 KB)
2070_trtexec.txt (16.3 KB)

same model , same softeware ,run by trtexec .
414MB GPU memory is required with GTX 1060
788MB GPU memory is required with RTX 2070

@NVES hi, I provide the info


  • It is expected that TensorRT GPU memory utilization is varies on different GPU architectures.
  • CUDA compute capability will be different for different GPU architectures. Also new arch would support new unit like tensorCore, this allow us to develop kernels that use more memory to speed up your NN.

Thank you.

If I want less memory consumption , and low speed is ok, can i control this ?


We can restrict/extend memory consumption using trtexec --worskpace flag.

Thank you.

Hi @spolisetty, While inferencing, can we make the workspace size limited? For different GPU architecture, the size loaded for the same architecture engine is different.


Thank you.