If you do a cudaMallocHost without a corresponding cudaFreeHost in a repetetive code (a loop, for example) you will certainly create a memory leak. For example, if you converted a new operation (which might automatically be freed based on c++ scoping rules) with a cudaMallocHost operation as I described here (without a free operation, in a loop) you could certainly create a memory leak.
We allocated a area in the memory to have the model outputs from the enqueue. This is static static int** argmax_buffer_cpu = nullptr;
Then we ran the batched inference using enqueue.
We keep on rewriting the area with new images and their model outputs.
The whole process in a CPP codebase which is then made as a shared object file.
The intake of images is in the Python codebase where we decode the msg and pass those bytes to CPP functions using ctypes shown below:
from ctypes import cdll
lib = cdll.LoadLibrary("/app/cpp_trt_processing.so")
# init func
lib.InitializeGPUMemory.argtypes = [
ctypes.c_int, # batch size
ctypes.c_int, # InputW
ctypes.c_int, # InputH
ctypes.c_int, # InputChannel
ctypes.c_int, # OutputChannel
ctypes.c_int, # Number of color models
ctypes.POINTER(ctypes.c_int), # Feature sizes list for models
ctypes.POINTER(ctypes.c_float), # Mean values list for kernels
ctypes.POINTER(ctypes.c_float), # Scale values list for kernels
ctypes.c_int, # DebugLevel
ctypes.c_bool # save_flag
]
lib.InitializeGPUMemory.restype = None
The above InitializeGPUMemory runs only once in the program logic.
Now few tests, which we did were:
Running an empty function in CPP codebase : No leak seen.
Running an empty argument function in CPP: No leak seen.
Just running cudaMallocHost on our static float*: The leak is observed in the graph.
EDIT:
We also tried cudaMallocHost paired with cudaFreeHost, although it would defeat our logic of processing images. But unfortunately the leak is still seen.
my guess would be your module is being called once per inference, or once per image, and so you are getting repeated calls to cudaMallocHost. Why not put a printf statement in place right after the call to cudaMallocHost to see if it is being called more than once. If it is, then that is your coding defect. If it isn’t, I’m at a loss to explain how the simple presence of a single cudaMallocHost call could lead to a ongoing memory leak. In that case it would probably be best to create the shortest possible example that shows the leak. Once you have done that, advance to latest CUDA version to see if it still exists. If it still exists, post your example here or file a bug.