How to make a proper cleanup after a matlab mex file that uses cuda?

Hello

Does anyone know how to make a proper cleanup after using cuda in a mex file in matlab?

Below follows an example code:

#include "cuda.h"

#include "mex.h"

static int initialized=0;

/*------------------------------------------------------------------------*/

void cleanup()

{

  mexPrintf("cleanup completed: %d\n",cudaThreadExit());

}

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) 

{

  size_t free,total;

  cudaMemGetInfo(&free,&total);

  mexPrintf("Free cuda memory: %ld bytes\n",free);

  if(initialized==0) {

    initialized=1;

    mexAtExit(cleanup);

  }

}

The results for running the code is as follows

>> cudafreememory

Free cuda memory: 1170182144 bytes

>> cudafreememory

Free cuda memory: 1170182144 bytes

>> cudafreememory

Free cuda memory: 1170182144 bytes

>> clear mex

cleanup completed: 0

>> cudafreememory

Free cuda memory: 1169788928 bytes

>> cudafreememory

Free cuda memory: 1169788928 bytes

>> clear mex

cleanup completed: 0

>> cudafreememory

Free cuda memory: 1169149952 bytes

As can be seen, every time the mex file is cleared from memory, the available free memory on the device decreases, while subsequent calls to the function without clear mex keeps the free memory constant. This even though cudaThreadExit() (same as cudaDeviceReset) is called during cleanup, which according to specification should clean up all cuda related resources.

After enough calls to clear mex, this code will eventually cause an out of memory error during the cuda initialization, and matlab has to be restarted to free the allocated resources. Even though I do not need to call clear mex that often, it is still undesirable to have memory leaks in the code. Therefore, does anyone have a solution to this?

Regards

Yes, it is quite tricky to use MEX and get a robust system. MATLAB doesn’t necessarily cleanup after itself nicely after each MEX call. You can have perfectly valid CUDA code and still get errors once you mix in MEX, especially once you start to push your code hard.

With Jacket we end up running a battery of 40,000+ unit tests with every source code commit to ferret out these things.

There is another post where I said the same thing and other people also contributed some thoughts. Check it out too.

Yes, I have seen that post too. My example code is basically doing the same thing as that code regarding the cleanup.

As a test, it took me 1010 calls to the function calls (with clear mex in between) before I had to restart matlab, which perhaps makes this a minor issue, but it is still annoying that I will have to inform my users to avoid all unnecessary calls to clear mex.

I did some further investigations by calling cudaMemGetInfo from another process. While running this code

#include "stdio.h"

#include "cuda.h"

int main() 

{

  size_t free,total;

  getc(stdin);

  cudaMemGetInfo(&free,&total);

  getc(stdin);

  cudaThreadExit();

  getc(stdin);

  cudaMemGetInfo(&free,&total);

  getc(stdin);

  cudaThreadExit();

  getc(stdin);

}

I got the following free memory data from the other process at the five getc(stdin) lines

Free cuda memory: 1426194432 bytes //before cuda initialization

Free cuda memory: 1247268864 bytes //after initialization

Free cuda memory: 1425678336 bytes //after cudaThreadExit(), note: not same as before initialization

Free cuda memory: 1247268864 bytes //after second initialization

Free cuda memory: 1425678336 bytes //after second cudaThreadExit()

Apparently, when calling cudaThreadExit(), the available memory does not return to exactly the same value as before the cudaMemGetInfo call (which only is there to initialize cuda). Reinitializing cuda and calling cudaThreadExit() multiple times does however not cause any additional decreas in available memory. After the code has terminated, the value is back to the originally value again, but for mex-files this apparently requires matlab to terminate before the memory is returned, hence resources are lost every time clear mex is called.

Some questions remain, why does not cudaThreadExit() resore the free memory to the same value as it was before the first cuda call and is there a way to do it?