Cublas function calls inside kernel code

Can I call cublas functions inside kernel code?

I’m trying to call cublasSgemm inside kernel code, but in emulation mode (I currently don’t have a enabled graphics card) the kernel function hangs at that point.

__global__ void kernel_function(float* mA, float* mB, float* mC, float* fA, float* fB) {

   printf("calling cublas function\n");

    cublasSgemm('n', 'n', 3, 1, 3, 1.0f, fA, 3, mA, 3, 0.0f, mC, 3);

    cublasSgemm('n', 'n', 3, 1, 3, 1.0f, fB, 3, mB, 3, 1.0f, mC, 3);

    printf("leaving kernel code\n");


I know that such code isn’t perfect, but it serves as an example. When the first function call to cublasSgemm happens, the executions hangs and doesn’t return.

What can I do?

Cublas functions are basically kernels unto themselves. Just do this in int main(), compiled with g++ (for example).

int main(){


// error handler

cublasStatus stat;

// create and alloc device memory, e.g.

float* host_B = (float*) malloc(mem_size_B);

// create and alloc device memory, e.g.

float* device_B;

stat = cublasAlloc(number_of_elements, sizeof(float), (void**)&device_B);

if(stat!=CUBLAS_STATUS_SUCCESS) printf(“memory allocation failed”);

// copy data from host to device

stat = cublasSetMatrix(num_rows,num_cols,sizeof(float),host_B,num_rows,device_B,num_rows);

//do cublass matrix-matrix operation


// return result

stat = cublasGetMatrix(…);

// free memory



return 0;


Basically, calling a cublas function from a kernel makes no sense.

Thanks Chirality, that’s what I argued with my friends here at work, except the fact that i wasn’t sure about that. It’s like a kernel calling another one.