Cuda-memory leak since Video Codec SKD 9.1 Windows drivers

Ext3h · October 21, 2019, 2:27pm

After installing 436.15 / 436.48 (both tested) Windows drivers on Maxwell workstations (GTX 970, GTX 950), “cuvidDestroyDecoder” started leaking. With previous driver relaeses, no such leak occurs. The same code does not leak on either Pascal or Turing based workstations.

The leak amounts approximately to the size of one surface, and leaks from shared memory pool. The leak appears to be deterministic, occuring 100% of the time. The leaked memory is released on application termination.

If triggered repeatedly, bluescreens occur with a reproducible TDR watchdig violation during memory allocation. (Even though that appears to be perfectly normal behavior for this driver when exhausting the shared memory pool…)

NVDEC is instantiated with the following parameters:

CUVIDDECODECREATEINFO = {
	ulWidth = 3840
	ulHeight = 2160
	ulNumDecodeSurfaces = 8
	CodecType = cudaVideoCodec_H264 (4)
	ChromaFormat = cudaVideoChromaFormat_420 (1)
	ulCreationFlags = 4
	bitDepthMinus8 = 0
	ulIntraDecodeOnly = 0
	ulMaxWidth = 0
	ulMaxHeight = 0
	Reserved1 = 0
	display_area = {left=0 top=0 right=0 bottom=0}
	OutputFormat = cudaVideoSurfaceFormat_NV12 (0)
	DeinterlaceMode = cudaVideoDeinterlaceMode_Weave (0)
	ulTargetWidth = 3840
	ulTargetHeight = 2160
	ulNumOutputSurfaces = 1
	vidLock = 0x0000019663072ba0
	target_rect = {left=0 top=0 right=0 bottom=0}
	Reserved2 = {0, 0, 0, 0, 0}
}

mandar_godse · October 23, 2019, 7:43am

Hi.
Can you provide exact details about how to reproduce this issue? Were you able to reproduce this issue using sample application from SDK?

Thanks.

Ext3h · October 25, 2019, 9:24am

Unclear what exactly is going on. We are using NVDEC with the old multithreaded pattern, where frames are throttled on parser input, rather than by backpressure in pfnDisplayPicture, with deferred transfer to host outside of pfnDisplayPicture. Also heavy multi threading, multi-GPU, and multiple concurrent 4k streams per GPU.

Anyway, it doesn’t look as if the memory is leaking in any part belonging to NVDEC itself, but rather allocations made with “cuMemHostAlloc(xxx, xxx, CU_MEMHOSTALLOC_DEVICEMAP)” started leaking despite being free’d with “cuMemFreeHost(xxx)”, including proper synchronization of in-flight streams.

I’m at loss why his started happening now, just started happening from that specific driver release onwards.
I already have double checked for possible leaks on our end, but none to be found. I have yet to completely verify that all of the post processing cuda kernels are properly safeguarded against overflows for unexpected “CUVIDPROCPARAMS” values.

Potential suspect, “cuMemHostAlloc(xxx, xxx, CU_MEMHOSTALLOC_DEVICEMAP)” / “cuMemFreeHost(xxx)” are being called concurrently from multiple threads on the same CUDA context, and also for multiple device context on different GPUs in parallel, while decoding and device to host copies are happening in parallel. We did encounter at least one Bluescreen due to a TDR watchdog violation in low level GDI allocation, so it’s quite possible that there is a regression in memory management.

PS: Issue doesn’t appears to be limited to Maxwell, but also occurs on Pascal and Turing GPUs, including Quadro cards. It’s not reliably reproducible on these systems though.

Ext3h · October 28, 2019, 10:04am

So, cuda-memcheck claims there is no leak, and neither a buffer overflow.
Yet there is an obvious leak, and the shared memory is actually released on releasing the primary device context.

Switching “cuMemHostAlloc” / “cuMemFreeHost” for “malloc + cuMemHostRegister” / “cuMemHostUnregister + free” did not make any difference.

“cuMemHostRegister + cuMemHostUnregister” in isolation, on an IDLE GPU are working reliably. It appears as if “cuMemHostUnregister” starts failing (silently) only when the GPUs are not idling.

val.zapod.vz · October 29, 2019, 5:37am

How did you understand that there is a memory leak if cuda-memcheck did not show it?
Please give us a minimal working and leaking copy of you code… Really!

Ext3h · October 29, 2019, 10:35am

Minimal example: https://gist.github.com/Ext3h/b037506884826f5a50e96e6f82647576

Doesn’t even involve nvcodec. Just calling cuMemHostRegister / cuMemHostUnregister concurrently is enough to trigger the leak with that driver.

The leak can be tracked via “\GPU Adapter Memory(hwid)\Shared Usage” winperf performance counter.
Respectively “\GPU Process Memory(pid_hwid)\Shared Usage” also correctly attributes the leaked shared memory to the test runner.
At 8GB combined usage over all NVidia GPUs in the system, driver API starts bailing out with CUDA_ERROR_OUT_OF_MEMORY. Or bluescreen right away, when unlucky.

On multi GPU system with Maxwell GPUs with these driver versions, it’s reaching 8GB limit within a second.
Pascal GPU for reference stays within the expected 512MB (plus overhead) peak shared memory usage. (Even though there is a rare heap corruption happening somewhere.)

val.zapod.vz · November 30, 2019, 5:37pm

Is that still leaking? In that case we need to escalate this.

Ext3h · December 1, 2019, 9:33am

Yes, it’s still leaking as far as I know, at least unless I missed some hotfix last week.

Issue is “in progress” in ticket #2762823, and turned out to be deterministic. A simple “shared memory is registered with all present GPUs, regardless of flags, but only unregistered for exactly one GPU, also regardless of flags”. Guaranteed to be missed by QA when they are only testing single GPU systems.

Topic		Replies	Views
Memory Leak when using cuGraphicsD3D9RegisterResource CUDA Programming and Performance cuda	9	1329	October 18, 2023
`cuCtxCreate` and `cuCtxDestroy` pairs have a memory leak CUDA Programming and Performance cuda , problem	9	1210	January 11, 2024
GPU memory leak CUDA Programming and Performance	2	2951	August 19, 2011
Huge memory leak CUDA Programming and Performance	16	5618	July 27, 2016
FAO: Nvidia Engineers:- Memory Leak in cudaMemcpyAsync Only occurs on Host To Device memory transfer CUDA Programming and Performance	4	5872	August 18, 2010
Memory leak in nvcuda.dll CUDA Programming and Performance	5	1166	April 5, 2017
cuvidCreateDecoder return error CUDA_ERROR_OUT_OF_MEMORY Video Processing & Optical Flow cuda , nvenc	5	1483	September 14, 2023
cudaMemPrefetchAsync returns cudaErrorInvalidDevice CUDA Programming and Performance	21	4518	November 15, 2021
Number of kilobytes transferred to/from shared memory twice the expected CUDA Programming and Performance	12	703	September 29, 2018
How to find leaks? cuda-gdb runs out of memory, but compute-sanitizer runs without erros CUDA-GDB	9	4271	March 22, 2023

Cuda-memory leak since Video Codec SKD 9.1 Windows drivers

Related topics