GPU memory leaks using shareable handles

Orin AGX 64 GB Developer Kit. Jetpack 6.0 w/CUDA 12.2.

I have two processes.

Process A creates blocks of GPU memory (cuMemCreate, cuMemExportToShareableHandle, etc) using the pattern shown in memMapIPCDrv in cuda-samples. It eventually calls cuMemRelease.

Process B imports the shareable handle (cuMemImportFromShareableHandle, cuMemMap, cuMemRelease, cuSetAccess, etc), runs kernels, and eventually closes the handle and calls cuMemUnmap and cuMemAddressFree.

Both processes are persistent. On Jetson, process B grows until we run out of memory.

I’ve demonstrated this behavior by breaking the memMapIPCDrv sample into writer and reader processes and observing memory utilization with jtop. The writer process is run repeatedly but the reader is persistent. Each time the writer runs, it creates a block of memory that the reader imports and runs a simple kernel against. When the writer exits, all the memory reserved (4MB in the example) is assigned to the reader (as seen in jtop) and not released after the reader calls cuMemUnmap and cuMemAddressFree.

Running the same code on x86, GPU utilization goes to zero as soon as the reader releases it.

I reported this as a bug and was directed to post here. It was close as Not A Bug. Grateful for any tips.

7 Likes

Hi,

As we have a newer software release, could you test if the same issue occurs on the latest JetPack 6.2?

If so, could you share a reproducible sample with us?
(Process B which has separated to writer and reader should be enough?)

We will need to test this internally before sharing more info with you.
Thanks.

I’ll put together a package with my rework of the memMapIpc sample code. I’ll also try 6.2. Thank you.

I’ve attached the code I extracted from the memMapIPCDrv sample. It includes a README_DEMO. These are the results I’m seeing:
The reader is started first.


Then the writer is started, passing in the process id of the reader (for the local socket). The next picture shows jtop when both the writer and reader are running. The reader has opened the shared memory, performed a trivial operation, and called cuMemUnmap() and cuMemAddressFree().

The final picture shows jtop after the writer has exited and the reader is waiting for a new handle. Each run of the writer will grow the reader by 4MB, the size of block created by the writer by cuMemCreate().

memMapIPCDrvLeak.tar.gz (29.9 KB)

FYI: the code needs to be built in a subdirectory under cuda-samples/Samples// to pick up the Common headers.

EDIT: Attached zip should build w/o cuda-samples installed.

memMapIPCDrvLeak_v2.tar.gz (78.2 KB)

Hi,

Thanks a lot for sharing the sample.
We will test this internally and share more info with you.

Thanks.

Hi,

Thanks for your patience.

We also observed the same behavior in our environment and now is checking with our internal team for more input.
Will let you know once we have more information can share.

Thanks.

1 Like

Just returned from vacation. That is excellent news (for me, at least). Thank you for the update.

Hello,

Are there any updates you can share? I reached out to a few contacts we have as well.

Thank you

Hi,

Thanks for your patience but our internal team is still working on this issue.

We have verified the CUDA API has released the memory correctly.
So now are checking with the resource manager team to gather more info.

Thanks.

1 Like

Any updates on this?

1 Like

Hi,

Thanks for your patience.

Our internal team needs more time for this issue.
Will keep you updated on the latest status.

Thansk.

FWIW, I’ve been told that the internal engineering team is actively investigating this issue.

Hi,

Yes, our internal team is actively working on this issue.
Will give you an update once we make further progress.

Thanks.

Hi,

Some update for you.

Thanks a lot for reporting this issue.
We have found the root cause and fixed it internally.
So the upcoming release will include the fix.

In the meantime, we are working on a pre-release library to fix this issue on top of r36.4.4 (JetPack 6.2.1) branch.
Will share further info with you once the library is available.

Thanks.

1 Like

Hello, I am using the same software described by mfennell albeit on an older Jetpack version that I am unable to upgrade at this time.

I’m wondering if it would be possible to backport this fix for Jetpack 5.1.4 [ L4T 35.6.0 ]?

Thank you,

Jeremy

Hi,

Now the fix is only compatible with r36.
But we will evaluate if the fix can be backported to the r35.

Thanks.

1 Like

Hi,

Please find below the fix info.

JetPack 6/r36.4.4

Please find the new driver in the link below:

JetPack 5/r35.6.2

Please find below the attachment for the fix.
We also update the info on this fix to our Making sure you're not a bot!
cuda_driver_35.6.2.tbz2 (32.1 MB)

Thanks.

1 Like

Thank you for providing the patch to 35.6.2. Will the official release be updated? 35.6.2 was released in May and does not include the libcuda.so attached here. I’m not sure if I should manage the library myself to build new systems or not.

Hi,

The patch is built for r35.6.2. You can apply it to r35.6.2 directly.
Thanks.