Memory leak when copying opencv mat to NvBufSurface on jetson

Hello,
I am trying to copy data from an opencv Mat into a NvBufSurface on the jetson using CUDA-EGL interops API. I’m running into a weird issue where although inference works and the results are correct, a memory leak occurs only when there are detections (performing face detection using centerface). When there are no detections, the memory being used stops increasing. Moreover, when I exit the application, I get the following message:

nvbuf_utils: dmabuf_fd 1490 mapped entry NOT found

Here’s a snippet of the code I’m using:

ASSERT(NvBufSurfaceCreate(&surface, batch_size, params) == 0, "Failed to create surface");
void* data_ptr = NULL;
CUgraphicsResource cuda_resource;
CUeglFrame egl_frame;
if (surface->memType == NVBUF_MEM_SURFACE_ARRAY) {
    ASSERT(NvBufSurfaceMapEglImage(surface, idx) == 0,
            "Could not map EglImage from NvBufSurface");
    ASSERT(cuGraphicsEGLRegisterImage(&cuda_resource,
                                        surface->surfaceList[idx].mappedAddr.eglImage,
                                        CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE) == CUDA_SUCCESS,
            "Failed to register EGLImage in cuda");
    ASSERT(cuGraphicsResourceGetMappedEglFrame(&egl_frame, cuda_resource, 0, 0) == CUDA_SUCCESS,
            "Failed to get mapped EGL frame");
    data_ptr = (char*)egl_frame.frame.pPitch[0];
} else {
    data_ptr = surface->surfaceList[idx].dataPtr;
}

CHECK_CUDA_STATUS(cudaMemcpy2D(data_ptr, surface->surfaceList[idx].pitch, mat->ptr(), mat->step,
                                mat->step, mat->rows, cudaMemcpyHostToDevice),
                    "Could not copy mat to surface");

if (surface->memType == NVBUF_MEM_SURFACE_ARRAY) {
    cuGraphicsUnregisterResource(cuda_resource);
    NvBufSurfaceUnMapEglImage(surface, idx);
}

surface->numFilled++;

Setup info:

• Hardware Platform (Jetson / GPU): Jetson
• DeepStream Version: 6.0.1
• JetPack Version (valid for Jetson only): Jetpack 4.6
• Docker image: nvcr.io/nvidia/deepstream-l4t:6.0.1-base (sha256:c9ca4d742c0d142db6db4e84ed0192f94cf4a883e3e4f75c4da40c5704b56a5a)

Hi,

Which Jeston device do you use? Is it Xavier?
More, suppose you are using JetPack 4.6.1, is this correct?

Thanks.

Hey AastaLLL,
Yes, its the Jetson AGX Xavier
I’m using jetpack 4.6. Do i need to check it out using jetpack 4.6.1?

Hey AastaLLL,
Same issue occurs in jetpack 4.6.1

I also tried the following:

ASSERT(NvBufSurfaceMemSet(surface, idx, 0, 0) == 0, "Failed to memset NvBufSurface");

// For mem types NVBUF_MEM_SURFACE_ARRAY and NVBUF_MEM_HANDLE
ASSERT(NvBufSurfaceMap(surface, idx, 0, NVBUF_MAP_WRITE) == 0,
        "Failed to map NvBufSurface for writing");
ASSERT(NvBufSurfaceSyncForCpu(surface, -1, 0) == 0,
        "Could not sync NvBufSurface for CPU");

memcpy(surface->surfaceList[idx].mappedAddr.addr[0], mat->ptr(), mat->step * mat->rows);

ASSERT(NvBufSurfaceSyncForDevice(surface, -1, 0) == 0,
        "Could not sync NvBufSurface for device");
ASSERT(NvBufSurfaceUnMap(surface, idx, 0) == 0, "Failed to unmap NvBufSurface");

surface->numFilled++;

Which gave the same problem. Things worked, but there’s still a memory leak, and I still get the nvbuf_utils: dmabuf 1471 mapped entry NOT found

Although, I had to give -1 for index when calling NvBufSurfaceSyncForCpu and NvBufSurfaceSyncForDevice. Giving idx (which is 0) gave a nvbufsurface: Wrong buffer index (0) error

Hi,

Do you mind sharing a complete source for us to reproduce?
Thanks.

Hey AastaLLL,
I set up a small reproducible project here. You can have a look.

@AastaLLL any updates? Were you able to reproduce the issue on your end?

Hi,

Sorry that we are still trying to reproduce this internally.

Since bazel is not natively installed on the Jetson.
Do you think it is possible to update the source with other building tools (ex. make or cmake) instead?

Thanks.

Hey AastaLLL,
sure here you go.
I’ve noticed the issue comes up after a few seconds, and happens when you obstruct the detected face often. It remains stable, then suddenly starts to rise

@AastaLLL Here’s a video showing the issue: link

Thanks.

We are trying to reproduce this issue internally.
Will share more information with you later.

@AastaLLL Is the link you sent correct? It points to some other website

Hi,

Sorry that it was some internal comment. Please ignore that reply directly.

We have checked your source and found some implementation that may cause the leakage.
First, the leakage comes from the mattosurf component rather than Deepstream.

It seems that you calling NvBufSurfaceCreate(.) in gst_mattosurf_prepare_output_buffer every frame.
But the buffer is not destroyed and might be the cause of the leakage.

Would you mind saving the NvBufSurface variable for the previous frame?
Then calling NvBufSurfaceDestroy(.) before creating a new one?

Thanks.

Hey AastaLLL,
Thanks for the reply.
NvSurfaceDestroy gets called in the GDestroyNotify parameter of gst_buffer_new_wrapped_full (the line is here). I can confirm it’s being called when I put a print statement before the destroy call.

Is it bad to call NvBufSurfaceCreate every frame? But shouldn’t a new surface created for every frame?

Hey AastaLLL,
I re-read your comment and realized I misunderstood a couple of things. I was letting gstreamer handle the destruction of the surface, but I’ll try doing what you said.
But why does this only occur on jetson? everything runs fine on dGPU (RTX 2060, Driver version: 510.54, Deepstream 6.1), there’s no leak. I feel like there’s something wrong with the cuda EGL related part of the mattosurf element. I don’t know much about the cuda EGL API’s, what I know comes from couple of comments in gstdsexample_optimized.cpp, and looking the documentation of the functions mentioned in those comments.
Is it right to keep egl mapping-registering-using-unregistering-unmapping the surface every frame?

Hi,

We found that this might not be an issue.
Suppose you use the same source on dGPU and everything works fine, is that correct?

If yes, may I know which power mode do you use?
If you are not using MaxN, would you mind giving it a try?

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

We found the leakage issue improves when the device is boosted.
Please let us know the behavior on your side as well.

Thanks.

Hey AastaLLL,
Yes, same source was used on dGPU, only difference was that no CUDA EGL APIs were used on dGPU, just regular CUDA.
As you can see in the video I sent, I was using MaxN when I found the issue.

Hi,

Thanks for the update.

Do you also apply the sudo jetson_clocks on your environment?
If not, please give it a try.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.