Function with cublasDnrm2 compiles but crashes when running it. What am I doing wrong?

pettes · January 20, 2017, 12:11pm

Hi,

I’m trying to create a Matlab MEX function that does a series of calculation on the GPU using cuBLAS. So far using cublasDgemm has been working flawlessly, but when I try to use cublasDgemm Matlab crashes. It does however compile without errors. As far as I can see I’m not doing anything wrong, but I’m relatively new to CUDA and C programming in general so maybe there is something obvious that one of you experts can see that I don’t…

Basically I have a vector, ti, on the device which is of double precision and of dimensions [m x p], where p = 1 so [m x 1]. I want to calculate the norm of the vector using cublasDgemm, and I have tried writing it the following way:

double *ti;
cudaMalloc(&ti, sizeof(double) ∗ m ∗ p);
…
// A bunch of other code here that works…
…
/* Calculate ti using cublasDgemm  */
cublasDgemm(handle,CUBLAS_OP_N,CUBLAS_OP_N,m,p,n,&alpha,deviceX,m,ri,n,&beta,ti,m);

/* Create a [1 x 1] variable called normti and allocate memory for it on the GPU */
double *normti;
cudaMalloc(&normti, sizeof(double) ∗ p ∗ p);

/* Create a cuBLAS handle and attempt to calculate the Euclidian norm using  cublasDnrm2 */
cublasCreate(&handle);
cublasDnrm2(handle, m, ti, 1, normti);

If I comment out the last line the program works but with the last step Matlab crashes… Any ideas what I’m doing wrong?

Thanks.

Robert_Crovella · January 20, 2017, 1:58pm

Study how you are allocating storage for the normti argument, then read the cublas documentation for nrm2 function, study what it says about the last argument, and read the cublas documentation for any discussion about cublas pointer mmode for handling scalars returned by cublas functions

pettes · January 20, 2017, 2:51pm

Unless I’m reading the wrong documentation (http://docs.nvidia.com/cuda/cublas/#axzz4WFQrdlXG), I have been reading it many times and it offers very little insight to me.

This is what it says about the last variable:

Parameter: result
Memory: host or device.
In/out: output.
Meaning: the resulting norm.

I’m guessing that the function tries to return the results to the host as default but I have allocated the variable on the device and that is the problem?
I found the function cublasSetPointerMode which if I understand correctly can tell cuBLAS to return results to the device instead. I tried adding that to the code:

/* Create a [1 x 1] variable called normti and allocate memory for it on the GPU */
double *normti;
cudaMalloc(&normti, sizeof(double) ∗ p ∗ p);
/* Create a cuBLAS handle and change the pointer mode so that cublasDnrm2 returns results to device*/
cublasCreate(&handle);
cublasSetPointerMode(handle, CUBLAS_POINTER_MODE_DEVICE);
/* Call cublasDnrm2 */ 
cublasDnrm2(handle, m, ti, 1, normti);

But it still crashes when I run it. If you know what the problem is and it’s trivial to fix it, could you please be more specific and tell me what the problem is? Or guide me in the direction of an example that illustrates your point? I have been googling cublasDnrm2 for several hours now without luck.

Thanks.

Robert_Crovella · January 20, 2017, 2:56pm

The missing SetPointerMode call was what I had in mind. It is mandatory if you want to return scalar results to the device.

Beyond that, I would need a complete example to study, that does not depend on MATLAB.
You might want to check the return codes of all API calls that you are doing, to see if any errors are returned.

pettes · January 21, 2017, 11:29am

I woke up today and tried the same exact same code again with my computer rebooted and now all of a sudden it works! What an incredible miracle!
Thank you for the hint on cublasSetPointerMode

Topic		Replies	Views
Cublas_status_execution_failed GPU-Accelerated Libraries	2	10677	February 23, 2021
cublasSgemm() alway fail during compute intensify task CUDA Programming and Performance	14	4555	January 8, 2015
Is it correct that my Pascal card is calling Maxwell_gemm kernels through cublas? And if so, why is cublas unusably slow for me? CUDA Programming and Performance	6	940	August 23, 2018
CUBLAS_STATUS_MAPPING_ERROR in cublasGetMatrix() after cublasDgemm() GPU-Accelerated Libraries	10	9276	February 21, 2013
cuBLAS call from kernel in CUDA 10.0 GPU-Accelerated Libraries	9	4838	April 7, 2021
cublasZgemm() gives false result for large data and potential bug GPU-Accelerated Libraries	6	1149	October 12, 2021
Strange "unspecified launch error" from a call to cublas gemm CUDA Programming and Performance	23	2692	January 19, 2019
Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaMemsetAsync CUDA Programming and Performance	7	7550	January 11, 2020
cublasSgemm results in null matrix CUDA Programming and Performance	5	758	May 28, 2019
Using CUBLAS with GTX295 CUDA Programming and Performance	2	1186	September 23, 2011

Function with cublasDnrm2 compiles but crashes when running it. What am I doing wrong?

Related topics