CUDA_CHECK(cudaMallocHost((void**)&argmax_buffer_cpu, BatchSize * OutputChannel * sizeof(float)));
The above code causes the leak in memory to appear in our docker container. We call it only once throughout the program.
When the code is enabled then only the leak starts to appear
The screenshot represents enabling the code vs removing the code.

NVIDIA-SMI 470.82.01 Driver Version: 470.82.01 CUDA Version: 11.4
If you do a cudaMallocHost
without a corresponding cudaFreeHost
in a repetetive code (a loop, for example) you will certainly create a memory leak. For example, if you converted a new
operation (which might automatically be freed based on c++ scoping rules) with a cudaMallocHost
operation as I described here (without a free operation, in a loop) you could certainly create a memory leak.
1 Like
Hi @Robert_Crovella Thanks for replying.
Yes, we are doing it only once in the module run.
Let me walk you through our process.
- We allocated a area in the memory to have the model outputs from the enqueue. This is
static
static int** argmax_buffer_cpu = nullptr;
- Then we ran the batched inference using enqueue.
- We keep on rewriting the area with new images and their model outputs.
The whole process in a CPP codebase which is then made as a shared object file.
The intake of images is in the Python codebase where we decode the msg and pass those bytes to CPP functions using ctypes shown below:
from ctypes import cdll
lib = cdll.LoadLibrary("/app/cpp_trt_processing.so")
# init func
lib.InitializeGPUMemory.argtypes = [
ctypes.c_int, # batch size
ctypes.c_int, # InputW
ctypes.c_int, # InputH
ctypes.c_int, # InputChannel
ctypes.c_int, # OutputChannel
ctypes.c_int, # Number of color models
ctypes.POINTER(ctypes.c_int), # Feature sizes list for models
ctypes.POINTER(ctypes.c_float), # Mean values list for kernels
ctypes.POINTER(ctypes.c_float), # Scale values list for kernels
ctypes.c_int, # DebugLevel
ctypes.c_bool # save_flag
]
lib.InitializeGPUMemory.restype = None
The above InitializeGPUMemory runs only once in the program logic.
Now few tests, which we did were:
- Running an empty function in CPP codebase : No leak seen.
- Running an empty argument function in CPP: No leak seen.
- Just running cudaMallocHost on our static float*: The leak is observed in the graph.
EDIT:
- We also tried
cudaMallocHost
paired with cudaFreeHost
, although it would defeat our logic of processing images. But unfortunately the leak is still seen.
my guess would be your module is being called once per inference, or once per image, and so you are getting repeated calls to cudaMallocHost
. Why not put a printf
statement in place right after the call to cudaMallocHost
to see if it is being called more than once. If it is, then that is your coding defect. If it isn’t, I’m at a loss to explain how the simple presence of a single cudaMallocHost
call could lead to a ongoing memory leak. In that case it would probably be best to create the shortest possible example that shows the leak. Once you have done that, advance to latest CUDA version to see if it still exists. If it still exists, post your example here or file a bug.
1 Like