Function with cublasDnrm2 compiles but crashes when running it. What am I doing wrong?

Hi,

I’m trying to create a Matlab MEX function that does a series of calculation on the GPU using cuBLAS. So far using cublasDgemm has been working flawlessly, but when I try to use cublasDgemm Matlab crashes. It does however compile without errors. As far as I can see I’m not doing anything wrong, but I’m relatively new to CUDA and C programming in general so maybe there is something obvious that one of you experts can see that I don’t…

Basically I have a vector, ti, on the device which is of double precision and of dimensions [m x p], where p = 1 so [m x 1]. I want to calculate the norm of the vector using cublasDgemm, and I have tried writing it the following way:

double *ti;
cudaMalloc(&ti, sizeof(double) ∗ m ∗ p);
…
// A bunch of other code here that works…
…
/* Calculate ti using cublasDgemm  */
cublasDgemm(handle,CUBLAS_OP_N,CUBLAS_OP_N,m,p,n,&alpha,deviceX,m,ri,n,&beta,ti,m);

/* Create a [1 x 1] variable called normti and allocate memory for it on the GPU */
double *normti;
cudaMalloc(&normti, sizeof(double) ∗ p ∗ p);

/* Create a cuBLAS handle and attempt to calculate the Euclidian norm using  cublasDnrm2 */
cublasCreate(&handle);
cublasDnrm2(handle, m, ti, 1, normti);

If I comment out the last line the program works but with the last step Matlab crashes… Any ideas what I’m doing wrong?

Thanks.

Study how you are allocating storage for the normti argument, then read the cublas documentation for nrm2 function, study what it says about the last argument, and read the cublas documentation for any discussion about cublas pointer mmode for handling scalars returned by cublas functions

Unless I’m reading the wrong documentation (http://docs.nvidia.com/cuda/cublas/#axzz4WFQrdlXG), I have been reading it many times and it offers very little insight to me.

This is what it says about the last variable:

Parameter: result
Memory: host or device.
In/out: output.
Meaning: the resulting norm.

I’m guessing that the function tries to return the results to the host as default but I have allocated the variable on the device and that is the problem?
I found the function cublasSetPointerMode which if I understand correctly can tell cuBLAS to return results to the device instead. I tried adding that to the code:

/* Create a [1 x 1] variable called normti and allocate memory for it on the GPU */
double *normti;
cudaMalloc(&normti, sizeof(double) ∗ p ∗ p);
/* Create a cuBLAS handle and change the pointer mode so that cublasDnrm2 returns results to device*/
cublasCreate(&handle);
cublasSetPointerMode(handle, CUBLAS_POINTER_MODE_DEVICE);
/* Call cublasDnrm2 */ 
cublasDnrm2(handle, m, ti, 1, normti);

But it still crashes when I run it. If you know what the problem is and it’s trivial to fix it, could you please be more specific and tell me what the problem is? Or guide me in the direction of an example that illustrates your point? I have been googling cublasDnrm2 for several hours now without luck.

Thanks.

The missing SetPointerMode call was what I had in mind. It is mandatory if you want to return scalar results to the device.

Beyond that, I would need a complete example to study, that does not depend on MATLAB.
You might want to check the return codes of all API calls that you are doing, to see if any errors are returned.

I woke up today and tried the same exact same code again with my computer rebooted and now all of a sudden it works! What an incredible miracle!
Thank you for the hint on cublasSetPointerMode