CUDA + MEX = Memory Leak

Dear Community,

I encountered a memory management issue during execution C++ CUDA code via MEX API in Matlab. I found out that community had a similar one, but their suggesstions are not enough to solve my problem. My MEX file creates an object, which under its methods calls CUDA code such as kernel executions or device memory allocations. The issue concerns device memory leak. Matlab does not free memory resources even though I explicitly called a cudaFree procedure.

  1. There are not memory leaks if I run my C++ CUDA code as a standalone binary file.
  2. Cuda-memcheck tool prints no errors regarding the memory leak.

Here is a template of my code:

#include <cuda_runtime.h>
#include <string>
#include "mex.h"

class MyClass {
    public:
	    MyClass() {};
	    ~MyClass() {};
	    void Method(const double* const data) {
		for(int i = 0; i < 2; i++) {
			double* deviceData; cudaMalloc(&deviceData, sizeof(double)*100000);
			cudaMemcpy(deviceData, data, sizeof(double)*100000, cudaMemcpyHostToDevice);
			cudaFree(deviceData);
		}
	    }
};

void mexFunction(int nlhs, mxArray *plhs[],
                 int nrhs, mxArray const *prhs[])
{     
    double* data = mxGetPr(prhs[0]);

    MyClass* object = new MyClass();
    object->Method(data);
    delete object;
}

The issue is afflicting because I allocate device resources in a loop in a object->Method(data) call.
Is there anybody who could help me solve an issue? Thanks in advance! I work with CUDA 7.5, MATLAB 2016a and graphic card GTX 1060.
I call cudaDeviceReset(); at the end of the code, but it does not help with memory allocation in object’s loop.

You mention a memory leak, but I don’t see the memory grow unbounded on the device side with your current source code.

That being said, I’ve used mexatexit before when I had issues doing a clear mexfile or clear all after running a mex code that did not terminate with a cudaThreadExit() call.

Syntax here: https://www.mathworks.com/help/matlab/apiref/mexatexit.html

Simplest edit to your code that doesn’t cause issues for me after attempting clear all is this:

#include <cuda_runtime.h>
#include <string>
#include "mex.h"

// function prototype -- cleanup needed so clear mex/all does not segfault or use excessive memory under Linux/Windows
void cleanup();

class MyClass {
public:
	MyClass() {};
	~MyClass() {};
	void Method(const double* const data) {
		for (int i = 0; i < 2; i++) {
			double* deviceData; cudaMalloc(&deviceData, sizeof(double) * 100000);
			cudaMemcpy(deviceData, data, sizeof(double) * 100000, cudaMemcpyHostToDevice);
			cudaFree(deviceData);
		}
	}
};

void mexFunction(int nlhs, mxArray *plhs[],
	int nrhs, mxArray const *prhs[])
{
	double *data = mxGetPr(prhs[0]);

	MyClass *object = new MyClass();
	object->Method(data);
	delete object;
	mexAtExit(cleanup);
}

void cleanup()
{
	printf("Memory cleanup completed, return: %d\n", cudaThreadExit());
	return;
}

See if that clears it up for you. Tried under 2017a, VS 2015, CUDA 8.0.