TensorRT memory management


Hi! I have been using TensorRT for a cuple of months, and I wonder if there is a way that I can manage the memory use myself. Because there are more than one TensorRT engines needed to be deployed when the program is running, and the problem is:
Everytime a new engine loading to the memory will lock a specific part of memory. However, the image processing functions also require GPU memory usage. Therefore in some cases, the memory segments might lead to the loading fail problem.
So is there a function or API that allows me to lock a specific part of the GPU memory before any of the engine deployed? Also, even the engine is unloaded, this part of the memory is still not accessible until the program exits or the process is killed.
This thing is similar to the tensorflow memory-pool, but I am not quite sure if TensorRT has similar features. Could anyone give me some help or sample code that I can refer to?


TensorRT Version:
GPU Type: Tesla V100
Nvidia Driver Version: 450.51.05
CUDA Version: 10.2
CUDNN Version: 7.6.5
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered


We can set up the allocator by ourselves with setGpuAllocator
It requires us to implement our own allocator.
I don’t think we have sample code that implements a memory pool for now.


This should not happen. As long as you correctly destroy all executionContext and engines all memory will be free.

Please refer to TensorRT samples at Sample Support Guide :: NVIDIA Deep Learning TensorRT Documentation
TensorRT/samples at master · NVIDIA/TensorRT · GitHub

Thank you.

Please check the below link, as they might answer your concerns


Thank you for your reply. That is what I am looking for. Really appreaciate!

1 Like

Thank you for your reply, I have done some experiment on creating IGpuAllocators to control the memory use when deserializing engines. This one seems to work before the IRuntime instance being destroyed. I am not sure if I am correct.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.