EDIT: I have a simpler example now. Please see the first comment for the updated version.
I am working on a hardware accelerated video decoder to use as part of a Media Foundation pipeline. The decoder is working however we have a few issues that cause the pipeline to fail. The first of these we have tracked down to a memory leak when using the cuGraphicsD3D9RegisterResource function.
We set up the IDirect3DDevice9 and cuda context and then create a surface using IDirect3DDevice9::CreateOffscreenPlainSurface using the D3DFMT_X8R8G8B8 format, default pool and now sharing. We then try and map this into cuda. Finally we unmap the resource and release the surface. If we do this repeatedly we see a leak. The leak isn’t large but it’s enough that over multiple sessions it adds up and causes us to run out of memory (We need to run for multiple days without restarting and may be switching video at rates of more than 4 streams every 15 seconds).
The two functions are given below
HRESULT AllocateHardwareSample(HardwareSample& hardwareSample, IDirect3DDevice9* pDevice, CUcontext context)
{
const size_t width = 1024;
const size_t height = 1024;
hardwareSample.surface = nullptr;
hardwareSample.resource = nullptr;
hardwareSample.context = context;
const auto hr = pDevice->CreateOffscreenPlainSurface(width, height, D3DFMT_X8R8G8B8, D3DPOOL_DEFAULT, &hardwareSample.surface, nullptr);
if (FAILED(hr)) {
ReleaseHardwareSample(hardwareSample);
return hr;
}
if (!ck(cuCtxPushCurrent(context)))
{
ReleaseHardwareSample(hardwareSample);
return E_FAIL;
}
CUgraphicsResource cuResource;
if (!ck(cuGraphicsD3D9RegisterResource(&hardwareSample.resource, hardwareSample.surface, CU_GRAPHICS_REGISTER_FLAGS_NONE))) {
ReleaseHardwareSample(hardwareSample);
return E_FAIL;
}
ck(cuCtxPopCurrent(0));
return S_OK;
}
and
{
if (hardwareSample.resource != nullptr) {
ck(cuCtxPushCurrent(hardwareSample.context));
ck(cuGraphicsUnregisterResource(hardwareSample.resource));
hardwareSample.resource = nullptr;
ck(cuCtxPopCurrent(NULL));
}
if (hardwareSample.surface != nullptr) {
const auto count = hardwareSample.surface->Release();
out << "Surface " << count << std::endl;
hardwareSample.surface = nullptr;
}
return S_OK;
}
If the map resource line is commented out then the code doesn’t leak. We have also checked and the surface has a reference count of 0 when we release and will therefore be destroyed so we aren’t leaking the entire surface. I have also tried deleting the context and IDirect3DDevice9 every 100 surfaces and recreating and the issue still shows but with dips in the memory usage each time we delete and recreate.
The full source code is available at Dropbox - main.cpp - Simplify your life
I am compiling using Visual Studio 2019 and CUDA toolkit version 11.1.5 in both 32 bit and 64 bit version. We have run it on windows 10 Desktop PCs using Nvidia Geforce GTX 1050 Ti driver version 497.09 and Quadro P2200 driver version 496.49. The issue seems to be present with all these options.
Any ideas how to fix this issue or is this a bug that I should report to the bug tracker?