How to manage the GPU memory and Host Memory? Especially release to OS

Description

I convert PyTorch model( Efficientnet-b2 about 30M) to ONNX model then serialized to an engine file and reload using tensorRT 7.0 C++.
The program cost about 2G host memory and 631M GPU memory. However,after I finish the inference and destroy the runtime, context, and engine. The program does not release the resource to the Operator System.
What’s the proper way to manage the resource, May I allocate the host memory in Memory Pool?

I read the TensorRT document and find the IGpuAllocator maybe a method to manage the GPU resource but it’s not quite clear for me.

I also read the buffers.h in the samples code, May I use this BufferManager to solve the issue above?

Environment

TensorRT Version: 7.0
GPU Type: GTX2080Ti
Nvidia Driver Version:443
CUDA Version: 10.0
CUDNN Version: 7.6.5
Operating System + Version: Ubuntu18.04
Python Version (if applicable): 3.6
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.14
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

I think issue seems to be due to all allocated memory not getting cleared/deallocated at the end of the script.

Yes, you can refer to buffer manager code as a sample to solve the issue.

Thanks