Memory leak in nvcuda.dll

In our cuda application I experience a memory leak in nvcuda.dll that I have spotted with the tool ‘Memory Validator’ from Software Verify Ltd. This is the stack trace I got from Memory Validator:

id:1,595,284 <<33,286 objects>> void * : 8,521,216 bytes, largest allocation 256 bytes at 0x000000002e298cd0 : [NoFileName Line 0]
Allocation location 1 of 33,286 allocations, Largest: 256 bytes, Total: 8,521,216 bytes
Heap ID: 0x0000000000980000
nvcuda.dll Ordinal39 : [NoFileName Line 0]
nvcuda.dll Ordinal39 : [NoFileName Line 0]
nvcuda.dll Ordinal39 : [NoFileName Line 0]
nvcuda.dll Ordinal39 : [NoFileName Line 0]
nvcuda.dll Ordinal39 : [NoFileName Line 0]
nvcuda.dll Ordinal39 : [NoFileName Line 0]
nvcuda.dll Ordinal39 : [NoFileName Line 0]
nvcuda.dll Ordinal39 : [NoFileName Line 0]
nvcuda.dll Ordinal39 : [NoFileName Line 0]

Unfortunately I am not able to reproduce the problem in a simple program. We have a complex application, that under certain circumstances enters a ‘leak mode’ in which each call of cudaDeviceSynchronize() loses a memory block of 256 bytes.

Once this ‘leak mode’ has been entered, I can deactivate most parts of the running application, so that a loop with only two cuda commands is left:

loop
{
cudaMemset(…);
cudaDeviceSynchronize();
}

The ‘leak mode’ persists in this case.

The circumstances under which this ‘leak mode’ is entered are very unclear, not always reproducible and kind of random. But once it is entered, it persists (memory consumption is growing indefinitely, then).

Windows 7, 64bit
nVidia Driver Version 376.33
Cuda Runtime Version 6.5
GeForce GTX 960
Visual Studio 2010

You might want to try newer versions of CUDA (e.g. 8.0, instead of 6.5). Bugs get fixed all the time.

I just upgraded to CUDA 8.0 but it did not fix the problem.

I wouldn’t be optimistic about getting help unless you can provide a code that reproduces the issue along with the steps needed to reproduce.

I switched back to an older nVidia driver, Version 353.06.
With this driver the problem does not occur.
But this is only a temporary solution because graphics cards supported by this driver will once not be available any more.

I will try to create a small program to reproduce the problem.

I was not able to create a small program that reproduces the problem. All I can say is:

The leak occurs in driver versions >= 369.04
and it does not occur in driver versions <= 368.81.

To trigger the problem you have to access a CUDA device from different threads at the same time.
When we remove calls to cudaDeviceSynchronize() from all threads but one, it occurs less frequently, but does not disappear completely.