memory leak in cuDNN?

I’m trying to used cudnn and find there may be some memory leak while creating and destroy cudnnhandle_t. I write the following code to test this:

#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include "cudnn.h"
#include <iostream>

using namespace std;

int main()
    for (int i = 0; i < 100000; i++)
        cout << i << endl;
        cudnnHandle_t handle(0);
        //delete handle;
        cudaError_t err = cudaGetLastError();
        cout << cudaGetErrorString(err) << endl;


    cout << "Press any key to continue..." << endl;

    return 0;

I’ve tried this code on window7 with visual studio 2015, cuda8.0 and cudnn6.0. The memory cost will increase stable both on GTX1080 and Quadro K620. How could I solve this problem, if I need to call cudnn for many times?

I tried your example on Linux with cudnn 7.2, cuda 8.0.61-1, nvidia driver 375.51 on Tesla K80.
I did not observe a memory leak. I found stable memory usage on the GPU (104 MiB) and on the host (less than 1% of RAM).

according to the cross-posting:

OP may be confused about what constitutes a memory leak.

There is certainly some overhead (e.g. 104 MiB) to initialize/use cudnn. This overhead should be “released” when the cudnnDestroy operation occurs.

OP seems to be objecting to the overhead, not an actual “memory leak” (based on discussion on cross-posting).

Thanks to gaul and txbob,
I’ve reported this issue as a cuDNN6’s bug. And I’ve tried this with the latest cuDNN v7.0.4. The consumption of memory is fixed(about 100M), not a linear growth as seen with cuDNN v6.0. There is still a fixed overhead after my execution, but that’s OK. Thanks again to both of you.

It seems that there is a small memory leak in cudnnConvolutionBiasActivationForward function.
Use the attached example to reproduce the leak. The provided code is based on the previous author’s code. (4.8 KB)

The approach for testing is the following:

  1. Repeat many times:
    a) Create and set all required descriptors
    b) Allocate all required memory
    c) Execute cudnnConvolutionBiasActivationForward
    d) Free all allocated memory
    e) Destroy all created descriptors

The allocated memory grows linearly with the number of iterations.
Memory dumps show that each iteration increases the allocated memory size by 3,52 KB.

Without calling cudnnConvolutionBiasActivationForward (step c) the same process doesn’t increase allocated memory size.

Test environment:
Windows 10, 64 bit
Visual Studio 2019
CUDA 11.0
CUDNN 8.0.1 for CUDA 11.0
Driver version: 451.48
Video card: GeForce GTX 1060 6 GB

The following testing scheme also can be used to detect a memory leak.
But the amount of allocated memory is a bit chaotic, and each time about 100 iterations are required to see a constant growth of allocated memory.

  1. Create and set all required descriptors
  2. Allocate all required memory
  3. Repeat many times:
    a) Execute cudnnConvolutionBiasActivationForward