Doubts about CUBLAS

Hi all,

I ve been trying to use the cublas library functions and I have several doubts at a very basic level. In all likelihood these will be very simple… But bear with me…

i) I read that CUBLAS is independent of CUDA and it gives its own functions for allocating memory, transferring data to the GPU etc… But CUBLAS will still work with Vectors or matrices that were allocated and transferred using the basic CUDA functions (cudaMalloc, cudaMemcpy) right…?

ii) Can the CUBLAS functions return a value to a variable on the host, or should I allocate and declare a variable on the device to hold the return value of the function?

iii) What does storage spacing between the elements of an array actually mean? Aren’t elements of an array stored contiguously in memory… In that case would storage spacing be 0.?


#define IDX2C(i,j,ld) (((j )*(ld)) + (i))

int main()


int i, array_size = 10;

float *arr1;

float *d_arr2;

arr1 = (float *)malloc(array_size * sizeof(float));


cudaMalloc( (void**) &d_arr1, array_size*sizeof(float));

for(i=0; i<array_size; i++)


 arr1[IDX2C(0,i,1)] = i;


for (i=0; i<array_size; i++)


printf("%f ", arr1[i]);


cudaMemcpy(d_arr1, arr1, array_size*sizeof(float), cudaMemcpyHostToDevice);

cublasSscal (array_size, 2.0, d_arr1,sizeof(float));

cudaMemcpy(arr1, d_arr1,array_size*sizeof(float), cudaMemcpyDeviceToHost);


for (i=0; i<array_size; i++)


printf("%f ", arr1[i]);




This was the program that i tried to get to run. Just declaring an array and double it using cublasSscal. Could anyone spot what my error is??

Apologies once again if my doubts are too basic…



There is a document in the CUDA/doc folder called CUBLAS_Library_2.1.pdf which contains some examples of how to use CUBLAS, as well as the rest of the documentation on the functions. Wherever you read that CUBLAS is independent of CUDA, it meant that you do not have to use CUDA functions or even #include “cuda.h”, as everything can be managed with CUBLAS function calls.

This includes the memory allocation you are trying to do in your program, you should be using cublasAlloc(), and cublasFree() as per the example. That should help solve all your problems listed.

However, it may still be possible to use standard cuda memory functions to set up arrays, I haven’t read the whole CUBLAS document.

Hi computerulz,

    Thanks a lot for your reply. Turns out my issues were actually caused because I was compiling the code in device emulation. When i tired compiling it without device emulation it worked perfectly. I dont know why this is though.. I just stumbled on this fix by simply trying to change everything.. But I ve posted a query on this in a seperate thread in the forum.



Were you linking the correct library?

From the CUBLAS doco:

"Applications using CUBLAS need to link against the DSO

(Linux), the DLL cublas.dll (Windows), or the dynamic library

cublas.dylib (Mac OS X) when building for the device, and against

the DSO (Linux), the DLL cublasemu.dll (Windows),

or the dynamic library cublasemu.dylib (Mac OS X) when building

for device emulation."

Thats the only thing I can think of that would cause that sort of behaviour. Then again, according to the same doco, the error you stated you get from the init function isn’t one of the possible return values, so there could be something else I’m missing.