Returning value of cublasSdot to Matlab

How could I be able to get the result (retVal) of cublas function cublasSdot back to matlab (mxGetPr(plhs[0]))?

float *devPtrx, *devPtry, retVal;

cublasAlloc (*n * abs(*incx), sizeof(x[0]), (void**)&devPtrx);

cublasAlloc (*n * abs(*incy), sizeof(y[0]), (void**)&devPtry);

cublasSetVector (*n, sizeof(x[0]), x, abs(*incx), devPtrx, abs(*incx));
cublasSetVector (*n, sizeof(y[0]), y, abs(*incy), devPtry, abs(*incy));
retVal = cublasSdot (*n, devPtrx, *incx, devPtry, *incy);
cublasFree (devPtrx);
cublasFree (devPtry);
return retVal;



alternatively the normal mex way of doing this (plhs, prhs, etc.)

The computed result of cublaSdot is saved in the GPU memory right? So first I have to transfer it back to the host and then send it to plhs[0]?

I tried something like this:

    plhs[0] = mxCreateNumericArray(2,dimenzija,mxDOUBLE_CLASS,mxREAL);

     C = mxGetPr(plhs[0]);

     c  = (float*) mxMalloc(sizeof(float)*vrsticaA*stolpecD);

    cublasAlloc (stolpecD*vrsticaA, sizeof(float), (void**)&gc);

     cublasSetMatrix (vrsticaA, stolpecD, sizeof(float),c, vrsticaA, (void*)gc, vrsticaA);

    dp = cublasSdot (stolpecA, kAgpu, spacing, kF0gpu, spacing);

    cudaMemcpy((void **)&dp, c, sizeof(float)*vrsticaA*vrsticaF0, cudaMemcpyDeviceToHost);

But it kills Matlab when I run it. Any suggestions?

first of all the mxarray you are creating is an array of doubles. that should be mxSINGLE_CLASS

second C is a double* it needs to be a float * to do that you have to use (float*)mxGetData rather than mxGetPr

third in this code there is not even an attempt to copy the gpu data into your (erroneous) mxarray pointer you need to do a cudaMemcpy to the C i described above

also always make sure you are aware of the single vs. double issue you cant just type in some numbers in matlab and then send them to cuda (unless you’ve written the appropriate conversion code which i doubt you have) you have to make sure to convert the numbers in matlab to singles.