New TensorRT Model occupying more GPU Memory as compared to older version


I am converting a tensorflow model (.h5 → saved_model_format → tensorrt model) to a tensorrt model using tensorflow 2.5.0 (attached It is (tensorrt model) is occupying almost 3.5GB of GPU Memory.

If I load the same model in the below specified environment, then the tensorrt model is occupying max ~1.1GB of GPU memory:
TensorRT Version:
GPU Type: GeForce RTX 2080 Ti
Nvidia Driver Version: 418.87.00
CUDA Version: 10.1
CUDNN Version: 7.6.2
Operating System + Version: Ubuntu 16.04.7 LTS
Python Version (if applicable): 3.6.13
TensorFlow Version (if applicable): 1.14.1
PyTorch Version (if applicable): NA
Baremetal or Container (if container which image + tag): NA

I also tried using tensorrt-nv-tensorrt-repo-ubuntu1804-cuda11.3-trt8.0.1.6-ga-20210626_1-1_amd64.deb with TF2.5, but the code won’t run as it requires And the code only runs if we use CUDA11.1.


TensorRT Version: 7.2.3-1+cuda11.1
GPU Type: NVIDIA GeForce RTX 3080
Nvidia Driver Version: 470.57.02
CUDA Version: 11.2 & 11.1(Got with the installation of TensorRT)
CUDNN Version:
Operating System + Version: Ubuntu 18.04.5 LTS
Python Version (if applicable): 3.9.6
TensorFlow Version (if applicable): 2.5.0
PyTorch Version (if applicable): NA
Baremetal or Container (if container which image + tag): NA

Relevant Files (4.2 KB)
dummy.h5 (4.6 MB)
gpu_usage (272 Bytes)

Output files:models

gpu_usage_dummy.txt (1.4 KB)
tensorrt_output.txt (16.5 KB)

Steps To Reproduce

Setting up the environment:

  • nvidia-driver installation
  • libnvinfer installation as mentioned here - libnvinfer7.2.3-1+cuda11.1
  • cuda installation steps: here - change ( ```
    sudo apt-get -y install cuda-11-2
* CUDNN installation: [from .deb](
* ```
sudo apt-get install -y --no-install-recommends libnvinfer7=7.2.3-1+cuda11.1 \    libnvinfer-dev=7.2.3-1+cuda11.1 \    libnvinfer-plugin7=7.2.3-1+cuda11.1
  • tensorflow installation: pip install tensorflow==2.5.0

Adding paths in .bashrc:

export PATH=/usr/local/cuda-11.1/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Running the code:

  • Put the dummy.h5 model in models directory
  • Run & run gpu_usage to obsereve the memory occupied

Please refer to the installation steps from the below link if in case you are missing on anything
Also, we suggest you to use TRT NGC containers to avoid any system dependency related issues.


I am not installing tensorrt using this guide. I am following tf-docs for this.

Do I need to install tensorrt as mentioned in Installation Guide :: NVIDIA Deep Learning TensorRT Documentation ? Because I am able to convert the model to tensorrt format without this.

Hey @NVES ,

I followed the installation steps mentioned in the install-guide. But still the memory issue is there.


Hi @meet,

Could you please try using TF-TRT on Tensorflow NGC container, and let us know if you still face this issue.

Thank you.

Hey @spolisetty,

I tried the latest container - 21.07-tf2-py3, but the issue is still there. It is also occupying way more memory than 2080ti environment mentioned above in the post.

Hey @spolisetty,

I also tried pytorch NGC. I observed that when I run the same inference script in both 3080 & 2080 ti (Both have different environments). The process occupies 1.7GB & 1GB of GPU respectively. And the inference time is less for 2080ti than 3080.

Can this be something related to GPU or GPU-Drivers?

Hi @meet,

Yes, GPU architecture and compute capability does matter.
Usually we ship high end GPU with more device memory, also new arch would support new unit like tensorCore, this allow us to develop more fancy kernels that use more memory to speed up your NN.

If you observe more difference in the inference time, please let us know steps to reproduce the issue.

Thank you.

Hey @spolisetty,

If this is the case, then can you tell me why going from 21.07 Tensorflow-NGC to 20.11 Tensorflow-NGC decreased the gpu-memory usage from 3.5GB to 1.3GB?

We tested with both TF1.15 & TF2.5 in 20.11 Tensorflow-NGC, both are occupying same gpu-memory. So we can rule out the TF version issue in this case.

From the inference side, I have 3080 on a 8 core CPU & 2080ti on a 32 core CPU.
As of now I am assuming it could be because of CPU bottleneck.

We will be testing the 3080 with a CPU with more cores. Will keep this thread updated with the results.