Hello,
I am using TensorRT 7.2 and I need to free the GPU memory used by a TensorRT engine in order to load another engine.
I read that the current API does not support the destroy
method, therefore the only way to explicitly unload the engine is by calling the __del__()
method. I am calling this method on the IExecutionContext
and the ICudaEngine
objects, however, I am not sure this complete frees the memory: I tried to load and upload the models multiple times, I see that the GPU utilization increases of a few Mb each time, so maybe there is
some kind of memory leak. I am getting measures using cupy free_bytes, total_bytes = cp.cuda.Device(0).mem_info
.
Here’s how I allocate my model:
import pycuda.driver as cuda
cuda.init()
import cupy as cp
import tensorrt as trt
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
self.trt_logger = TRT_LOGGER
self.trt_runtime = trt.Runtime(TRT_LOGGER)
self.device = cp.cuda.Device(0)
self.stream = cp.cuda.Stream()
trt.init_libnvinfer_plugins(TRT_LOGGER, "")
self.trt_engine = self._load_engine(engine_path=self.engine_path)
self.context = self.trt_engine.create_execution_context()
Buffers are allocated using cupy
and I verified they are not the cause of any memory leak. Is there any other variable I should free other than trt_engine
and context
?
Right now I do
trt_engine.__del__()
context.__del__()
Thanks