GPU memory leaks using shareable handles

mfennell · April 11, 2025, 2:12pm

Orin AGX 64 GB Developer Kit. Jetpack 6.0 w/CUDA 12.2.

I have two processes.

Process A creates blocks of GPU memory (cuMemCreate, cuMemExportToShareableHandle, etc) using the pattern shown in memMapIPCDrv in cuda-samples. It eventually calls cuMemRelease.

Process B imports the shareable handle (cuMemImportFromShareableHandle, cuMemMap, cuMemRelease, cuSetAccess, etc), runs kernels, and eventually closes the handle and calls cuMemUnmap and cuMemAddressFree.

Both processes are persistent. On Jetson, process B grows until we run out of memory.

I’ve demonstrated this behavior by breaking the memMapIPCDrv sample into writer and reader processes and observing memory utilization with jtop. The writer process is run repeatedly but the reader is persistent. Each time the writer runs, it creates a block of memory that the reader imports and runs a simple kernel against. When the writer exits, all the memory reserved (4MB in the example) is assigned to the reader (as seen in jtop) and not released after the reader calls cuMemUnmap and cuMemAddressFree.

Running the same code on x86, GPU utilization goes to zero as soon as the reader releases it.

I reported this as a bug and was directed to post here. It was close as Not A Bug. Grateful for any tips.

AastaLLL · April 14, 2025, 2:25am

Hi,

As we have a newer software release, could you test if the same issue occurs on the latest JetPack 6.2?

If so, could you share a reproducible sample with us?
(Process B which has separated to writer and reader should be enough?)

We will need to test this internally before sharing more info with you.
Thanks.

mfennell · April 14, 2025, 2:33pm

I’ll put together a package with my rework of the memMapIpc sample code. I’ll also try 6.2. Thank you.

mfennell · April 14, 2025, 7:06pm

I’ve attached the code I extracted from the memMapIPCDrv sample. It includes a README_DEMO. These are the results I’m seeing:
The reader is started first.

Then the writer is started, passing in the process id of the reader (for the local socket). The next picture shows jtop when both the writer and reader are running. The reader has opened the shared memory, performed a trivial operation, and called cuMemUnmap() and cuMemAddressFree().

The final picture shows jtop after the writer has exited and the reader is waiting for a new handle. Each run of the writer will grow the reader by 4MB, the size of block created by the writer by cuMemCreate().

memMapIPCDrvLeak.tar.gz (29.9 KB)

mfennell · April 15, 2025, 3:10pm

FYI: the code needs to be built in a subdirectory under cuda-samples/Samples// to pick up the Common headers.

EDIT: Attached zip should build w/o cuda-samples installed.

memMapIPCDrvLeak_v2.tar.gz (78.2 KB)

AastaLLL · April 16, 2025, 6:07am

Hi,

Thanks a lot for sharing the sample.
We will test this internally and share more info with you.

Thanks.

AastaLLL · April 17, 2025, 2:57am

Hi,

Thanks for your patience.

We also observed the same behavior in our environment and now is checking with our internal team for more input.
Will let you know once we have more information can share.

Thanks.

mfennell · April 28, 2025, 1:03pm

Just returned from vacation. That is excellent news (for me, at least). Thank you for the update.

mfennell · May 5, 2025, 2:17pm

Hello,

Are there any updates you can share? I reached out to a few contacts we have as well.

Thank you

AastaLLL · May 12, 2025, 9:37am

Hi,

Thanks for your patience but our internal team is still working on this issue.

We have verified the CUDA API has released the memory correctly.
So now are checking with the resource manager team to gather more info.

Thanks.

varunsaran20 · June 27, 2025, 9:20pm

Any updates on this?

AastaLLL · June 30, 2025, 4:30am

Hi,

Thanks for your patience.

Our internal team needs more time for this issue.
Will keep you updated on the latest status.

Thansk.

mfennell · July 16, 2025, 2:10pm

FWIW, I’ve been told that the internal engineering team is actively investigating this issue.

AastaLLL · July 17, 2025, 5:30am

Hi,

Yes, our internal team is actively working on this issue.
Will give you an update once we make further progress.

Thanks.

AastaLLL · July 31, 2025, 7:32am

Hi,

Some update for you.

Thanks a lot for reporting this issue.
We have found the root cause and fixed it internally.
So the upcoming release will include the fix.

In the meantime, we are working on a pre-release library to fix this issue on top of r36.4.4 (JetPack 6.2.1) branch.
Will share further info with you once the library is available.

Thanks.

jmartin5 · August 9, 2025, 3:57am

Hello, I am using the same software described by mfennell albeit on an older Jetpack version that I am unable to upgrade at this time.

I’m wondering if it would be possible to backport this fix for Jetpack 5.1.4 [ L4T 35.6.0 ]?

Thank you,

Jeremy

AastaLLL · August 13, 2025, 6:16am

Hi,

Now the fix is only compatible with r36.
But we will evaluate if the fix can be backported to the r35.

Thanks.

AastaLLL · August 19, 2025, 2:31am

Hi,

Please find below the fix info.

JetPack 6/r36.4.4

Please find the new driver in the link below:

JetPack 5/r35.6.2

Please find below the attachment for the fix.
We also update the info on this fix to our Making sure you're not a bot!
cuda_driver_35.6.2.tbz2 (32.1 MB)

Thanks.

mfennell · August 29, 2025, 1:29pm

Thank you for providing the patch to 35.6.2. Will the official release be updated? 35.6.2 was released in May and does not include the libcuda.so attached here. I’m not sure if I should manage the library myself to build new systems or not.

AastaLLL · September 3, 2025, 3:17am

Hi,

The patch is built for r35.6.2. You can apply it to r35.6.2 directly.
Thanks.

Topic		Replies	Views
Memory Leak when using Virtual Memory API (cuMemImportFromShareableHandle) DRIVE AGX Orin General driveos-cuda	14	336	May 6, 2025
Kmalloc-128 & kernfs_node_cache leak on Orin NX 16GB (JP 6.1) Jetson Orin NX kernel , nvbugs	20	299	March 24, 2026
Cuda-memory leak since Video Codec SKD 9.1 Windows drivers Video Codec, PyNv & OFA	7	1261	December 1, 2019
Jetson camera has memory leak and civc->cb_ctx is occupied Jetson AGX Orin camera	13	491	August 13, 2025
CUDA IPC Memory Sharing Support on Jetson AGX Orin 64GB with JetPack 6.0 Jetson AGX Orin tensorrt , cuda	6	395	July 10, 2025
Shared memory filehandle leak ? Jetson TK1	0	552	June 1, 2016
Memory leak for CUDA runtime lib DRIVE AGX Orin General driveos-cuda	6	555	February 7, 2024
Virtual Memory Management APIs on the DRIVE AGX Orin DRIVE AGX Orin General driveos-cuda	6	937	May 4, 2023
Kernel memory leak(s) with simple nvarguscamerasrc pipeline Jetson Orin NX camera	1	92	March 26, 2025
JetPack 6.0 (Rev 2) Kernel Memory Leak Jetson Orin Nano cuda , kernel , ubuntu , jetson , deepstream	4	344	October 7, 2024

GPU memory leaks using shareable handles

JetPack 6/r36.4.4

JetPack 5/r35.6.2

Related topics