Passing gpu variables between mex functions

I’m wondering, what is the most efficient way of passing pointers to GPU memory between several mex-functions?

Here is a pseudo-code to illustrate the use case:

mex_func_create_vector_on_gpu() 

mex_func_perform_calc1_on_gpu()

... 

mex_func_perform_calcN_on_gpu()

mex_func_copy_gpu_vector_to_host()

max_func_destroy_vector_on_gpu()

Each of these functions is easily implementable, but the tricky part is how to create an efficient interface between them.

Of course, we can use Matlab’s workspace to copy GPU variable to the Host after each function call - but this involves a bunch of unnecessary

memory operations after each mex-function execution and comes with a big performance hit.

It seems that this thread is very close to the solution of my problem:

http://forums.nvidia.com/index.php?showtop…rt=#entry398348

but I’m not sure how to take advantage of it in my scenario.

Try the Matlab interface Jacket by Accelereyes…

Thank you for your response. I’m familiar with Jacket (and I highly recommend it to anyone) but I really need to build something custom this time - so I guess my question stays the same.

What about making one function:

mex_func('create_vector_on_gpu', ) 

mex_func('perform_calc1_on_gpu', )

... 

mex_func('perform_calcN_on_gpu', )

mex_func('copy_gpu_vector_to_host', )

max_func('destroy_vector_on_gpu', )

Then you can just keep your gpu variables in persistent pointers

I think I managed to pass mex GPU pointers between different mex-functions about a year ago, I think the trick was to not destroy the context when the mex-function is finished.

That is exactly what I’m trying to do. It is just the matter of HOW to pass this pointer - it should be wrapped somehow into mxArray, as everything that goes in and out of the mex function.

Here is what i have so far:

MEX FUNCTION 1

double* h_in;

double* h_out;

double* g_var;

void mexFunction( int nlhs, mxArray *plhs[], int nrhs, const mxArray*prhs[] )

{

	if (nrhs !=1) mexErrMsgTxt("Must have one input argument");

	// create a pointer to the real data in the input matrix

	h_var = mxGetPr(prhs[0]);

	// calculate mem_size

	int m = mxGetM(prhs[0]);

	int n = mxGetN(prhs[0]);

	const unsigned int mem_size = sizeof(double) * m * n;

	// allocate device mem

	cutilSafeCall( cudaMalloc( (void**) &g_var, mem_size));

	// copy input data to device

	cutilSafeCall( cudaMemcpy( g_var, m_var, mem_size, cudaMemcpyHostToDevice) ); 

	// Create an mxArray for the output data 

	plhs[0] = mxCreateDoubleMatrix(1, 1, mxREAL);

	// Create a pointer to the output data 

	h_out = mxGetPr(plhs[0]);

	h_out = g_var;

}

and the second mex-function on the receiver side:

MEX FUNCTION 2

void mexFunction( int nlhs, mxArray *plhs[], int nrhs, const mxArray*prhs[] )

{

	// check: only one input and one output argument

	if (nrhs !=1) mexErrMsgTxt("Must have one input argument");

	g_var = (double *)mxGetPr(prhs[0]);

	int m = mxGetM(prhs[0]);

	int n = mxGetN(prhs[0]);

	// calculate mem_size

	const unsigned int mem_size = sizeof(double) * m * n;

	//allocate host mem

	h_out = (double*)malloc(mem_size);

	if (h_out == 0) mexPrintf("host: unable to allocate memory");

	// copy input data to device

	cutilSafeCall( cudaMemcpy( h_out, g_var, mem_size, cudaMemcpyDeviceToHost) ); 

}

I’m definitely missing something here. The last call to cudaMemcpy in the second mex function crashes matlab.

Yes, I was thinking about that, but it is very convenient to have these little functions as a separate building blocks - so I can reuse them in my future projects.

I guess this is going to be my ‘PLAN B’ if I exhaust my other possibilities.

You can’t pass the pointer as a double, you have to convert it to an int…