Resource Leak on cuD3D9CtxCreate() and cuCtxDestroy()

I have problem about the resource leak related to cuD3D9CtxCreate and cuCtxDestroy.
It seems to me that cuD3D9CtxCreate() increments reference count of IDirect3DDevice9 and IDirect3D9 but cuCtxDestroy() decrement only IDirect3DDevice9. Please let me know if I need to manually decrement reference count for it or just bug of this API?
I tried following sample code and confirmed pD3D9->Release() always return “1”
The environment is Windows Vista / Core2Quad Q6600 RAM 4GByte / GeForce 8600GT / CUDA ToolKit and SDK version is 2.1 / Driver Version is 181.20

– Sample Code –

// Just Create  IDirect3DDevice9 and IDirect3D9
pD3D9 = Direct3DCreate9(D3D_SDK_VERSION); 
hr = pD3D9->CreateDevice(pD3DD9);

// Just Create and Destroy
result = cuD3D9CtxCreate(&cuContext, &cuDevice, 0,  pD3DD9);
if (result == CUDA_SUCCESS){
    result = cuCtxPopCurrent(NULL);
    if (result == CUDA_SUCCESS){
        if (cuContext){
            result = cuCtxPushCurrent(cuContext);
        if (cuContext){
            result = cuCtxDestroy(cuContext);  // this must decrement reference count of not only pD3DD9 but also pD3D9

// Check Reference Count
if (pD3DD9){
    ULONG refCount = pD3DD9->Release();
        TCHAR out[256];
        _stprintf_s(out, 256, _T("refCount=%d, refCount should be 0\n"), refCount);
if (pD3D9){
    ULONG refCount = pD3D9->Release(); // this always returns "1"
    if (refCount){
        TCHAR out[256];
        _stprintf_s(out, 256, _T("refCount=%d, refCount should be 0\n"), refCount); 
return S_OK;    

Yes, it leaks. Thanks for the NVIDIA support in such critical application use cases :)

After a days ‘digging’ I found the leak was in fact multiple calls to:
Direct3DCreate9Ex(D3D_SDK_VERSION, &pD3D9Ex) within the same process.

0x00007ffe3e4fd694: ntdll!NtCreateMutant+0x0000000000000014
0x00007ffe3b86b343: KERNELBASE!CreateMutexExW+0x0000000000000073
0x00007ffe3b86b277: KERNELBASE!CreateMutexExA+0x0000000000000037
0x00007ffe30961011: <Unloaded_igdgmm64.dll>+0x0000000000001011
0x00007ffe30a86beb: <Unloaded_igdgmm64.dll>+0x0000000000126beb
0x00007ffe30a82e5a: <Unloaded_igdgmm64.dll>+0x0000000000122e5a
0x00007ffe30a82fd0: <Unloaded_igdgmm64.dll>+0x0000000000122fd0
0x00007ffe3e4850a1: ntdll!RtlActivateActivationContextUnsafeFast+0x0000000000000121
0x00007ffe3e4c9405: ntdll!LdrGetProcedureAddressEx+0x00000000000002b5
0x00007ffe3e4c91f8: ntdll!LdrGetProcedureAddressEx+0x00000000000000a8
0x00007ffe3e48aa97: ntdll!RtlIsCriticalSectionLockedByThread+0x0000000000000547
0x00007ffe3e482591: ntdll!RtlMultiByteToUnicodeSize+0x0000000000000461
0x00007ffe3e4822a8: ntdll!RtlMultiByteToUnicodeSize+0x0000000000000178
0x00007ffe3e481764: ntdll!LdrLoadDll+0x00000000000000e4

As you can see, this is in fact an Intel related leak when an NVIDIA card is being used in the call (Intel iGPU also available).
However, on an AMD processor, multiple calls result in a crash, thats another story.

Conclusion, the documentation for Direct3DCreate9Ex does suggest your program should call this only once at the start and then release once at the end. Any other calls and release to Direct3DCreate9Ex in between seem optimized and will not leak the Mutex handle in the Intel driver.

See the following to reproduce the leak:

while (true)
hRes = Direct3DCreate9Ex(D3D_SDK_VERSION, &pD3D9Ex);
int ret = pD3D9Ex->Release();

This information was obtained using drivers March 2020 form all GC providers.