Matlab mex files and cudaMallocHost

In one of my Matlab mex functions, I am allocating an array in the usual way like this:

dims0[0]=N; dims0[1]=M;

Ap[0] = mxCreateNumericArray(2,dims0,mxSINGLE_CLASS,mxREAL);

A = (float*) mxGetData(Ap[0]);

This has been working fine. This array is filled in the mex routine and then shipped out to the GPU device; its a rather large array:

cudaMemcpy (Arg, A, N*M * sizeof(A[0]), cudaMemcpyHostToDevice);

Its all been working great.

Reading the CUDA reference documentation, I ran across the suggestion that where possible cudaMallocHost() should be used, because the page-locked memory allows for faster Host-Device data transfer. Ah ha! says I, I’ll do:

cudaMallocHost((void**)&A,sizeof(float)*M*N);

instead of the Matlab array allocation, and so I can speed up the transfer of A to the device. This works fine, up until a mexCallMATLAB(1,&lhs[0],2,rhs,“mrdivide”); is called, where these variables have nothing to do with A.

When the mexCallMATLAB is reached the program crashes, with the usual Matlab barf. I’ve concluded that cudaMallocHost() does not play well with Matlab, and that memory corruption results.

Is this true? Is cudaMallocHost() to be avoided in mex files?

Perhaps the answer is here: http://www.mathworks.de/matlabcentral/news…w_thread/162021 (This looks like a well known issue, but I’ll make the post anyways…I should likely use mxCalloc instead)

On reflection this afternoon, it occurs to me that one could actually sometimes get away with using cudaMallocHost() in a mex file - it is a matter of avoiding any calls to matlab. So one could cudaMallocHost(), compute away, and then clear out the CUDA variables before Matlab is aware of what is going on and complain (its the “don’t ask, don’t tell” memory policy). Unfortunately, I need the matlab division, so I am stuck…

(I’ve said it once, I’ll say it again…we need a CUDA matrix division routine… C=A/B)