Hello,
I am trying to copy data from an opencv Mat into a NvBufSurface on the jetson using CUDA-EGL interops API. I’m running into a weird issue where although inference works and the results are correct, a memory leak occurs only when there are detections (performing face detection using centerface). When there are no detections, the memory being used stops increasing. Moreover, when I exit the application, I get the following message:
nvbuf_utils: dmabuf_fd 1490 mapped entry NOT found
Here’s a snippet of the code I’m using:
ASSERT(NvBufSurfaceCreate(&surface, batch_size, params) == 0, "Failed to create surface");
void* data_ptr = NULL;
CUgraphicsResource cuda_resource;
CUeglFrame egl_frame;
if (surface->memType == NVBUF_MEM_SURFACE_ARRAY) {
ASSERT(NvBufSurfaceMapEglImage(surface, idx) == 0,
"Could not map EglImage from NvBufSurface");
ASSERT(cuGraphicsEGLRegisterImage(&cuda_resource,
surface->surfaceList[idx].mappedAddr.eglImage,
CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE) == CUDA_SUCCESS,
"Failed to register EGLImage in cuda");
ASSERT(cuGraphicsResourceGetMappedEglFrame(&egl_frame, cuda_resource, 0, 0) == CUDA_SUCCESS,
"Failed to get mapped EGL frame");
data_ptr = (char*)egl_frame.frame.pPitch[0];
} else {
data_ptr = surface->surfaceList[idx].dataPtr;
}
CHECK_CUDA_STATUS(cudaMemcpy2D(data_ptr, surface->surfaceList[idx].pitch, mat->ptr(), mat->step,
mat->step, mat->rows, cudaMemcpyHostToDevice),
"Could not copy mat to surface");
if (surface->memType == NVBUF_MEM_SURFACE_ARRAY) {
cuGraphicsUnregisterResource(cuda_resource);
NvBufSurfaceUnMapEglImage(surface, idx);
}
surface->numFilled++;
ASSERT(NvBufSurfaceMemSet(surface, idx, 0, 0) == 0, "Failed to memset NvBufSurface");
// For mem types NVBUF_MEM_SURFACE_ARRAY and NVBUF_MEM_HANDLE
ASSERT(NvBufSurfaceMap(surface, idx, 0, NVBUF_MAP_WRITE) == 0,
"Failed to map NvBufSurface for writing");
ASSERT(NvBufSurfaceSyncForCpu(surface, -1, 0) == 0,
"Could not sync NvBufSurface for CPU");
memcpy(surface->surfaceList[idx].mappedAddr.addr[0], mat->ptr(), mat->step * mat->rows);
ASSERT(NvBufSurfaceSyncForDevice(surface, -1, 0) == 0,
"Could not sync NvBufSurface for device");
ASSERT(NvBufSurfaceUnMap(surface, idx, 0) == 0, "Failed to unmap NvBufSurface");
surface->numFilled++;
Which gave the same problem. Things worked, but there’s still a memory leak, and I still get the nvbuf_utils: dmabuf 1471 mapped entry NOT found
Although, I had to give -1 for index when calling NvBufSurfaceSyncForCpu and NvBufSurfaceSyncForDevice. Giving idx (which is 0) gave a nvbufsurface: Wrong buffer index (0) error
Sorry that we are still trying to reproduce this internally.
Since bazel is not natively installed on the Jetson.
Do you think it is possible to update the source with other building tools (ex. make or cmake) instead?
Hey AastaLLL,
sure here you go.
I’ve noticed the issue comes up after a few seconds, and happens when you obstruct the detected face often. It remains stable, then suddenly starts to rise
Sorry that it was some internal comment. Please ignore that reply directly.
We have checked your source and found some implementation that may cause the leakage.
First, the leakage comes from the mattosurf component rather than Deepstream.
It seems that you calling NvBufSurfaceCreate(.) in gst_mattosurf_prepare_output_buffer every frame.
But the buffer is not destroyed and might be the cause of the leakage.
Would you mind saving the NvBufSurface variable for the previous frame?
Then calling NvBufSurfaceDestroy(.) before creating a new one?
Hey AastaLLL,
Thanks for the reply. NvSurfaceDestroy gets called in the GDestroyNotify parameter of gst_buffer_new_wrapped_full (the line is here). I can confirm it’s being called when I put a print statement before the destroy call.
Is it bad to call NvBufSurfaceCreate every frame? But shouldn’t a new surface created for every frame?
Hey AastaLLL,
I re-read your comment and realized I misunderstood a couple of things. I was letting gstreamer handle the destruction of the surface, but I’ll try doing what you said.
But why does this only occur on jetson? everything runs fine on dGPU (RTX 2060, Driver version: 510.54, Deepstream 6.1), there’s no leak. I feel like there’s something wrong with the cuda EGL related part of the mattosurf element. I don’t know much about the cuda EGL API’s, what I know comes from couple of comments in gstdsexample_optimized.cpp, and looking the documentation of the functions mentioned in those comments.
Is it right to keep egl mapping-registering-using-unregistering-unmapping the surface every frame?
Hey AastaLLL,
Yes, same source was used on dGPU, only difference was that no CUDA EGL APIs were used on dGPU, just regular CUDA.
As you can see in the video I sent, I was using MaxN when I found the issue.