Free TensorRT GPU memory using Python API


I am using TensorRT 7.2 and I need to free the GPU memory used by a TensorRT engine in order to load another engine.
I read that the current API does not support the destroy method, therefore the only way to explicitly unload the engine is by calling the __del__() method. I am calling this method on the IExecutionContext and the ICudaEngine objects, however, I am not sure this complete frees the memory: I tried to load and upload the models multiple times, I see that the GPU utilization increases of a few Mb each time, so maybe there is
some kind of memory leak. I am getting measures using cupy free_bytes, total_bytes = cp.cuda.Device(0).mem_info.

Here’s how I allocate my model:

import pycuda.driver as cuda
import cupy as cp
import tensorrt as trt

        TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
        self.trt_logger = TRT_LOGGER
        self.trt_runtime = trt.Runtime(TRT_LOGGER)
        self.device = cp.cuda.Device(0) = cp.cuda.Stream()
        trt.init_libnvinfer_plugins(TRT_LOGGER, "")
        self.trt_engine = self._load_engine(engine_path=self.engine_path)
        self.context = self.trt_engine.create_execution_context()

Buffers are allocated using cupy and I verified they are not the cause of any memory leak. Is there any other variable I should free other than trt_engine and context?

Right now I do



Please check the below link, as they might answer your concerns


Hi @mfoglio,

The __del__() in python API is equivalent to the C++ destroy(). So it should not be a problem.
We request you to provide repro scripts/model in case you face some issue.

Thank you.