Help required with CUBLAS

I am calling cublasSdot inside the kernel but the code just hangs. Everything else like cublasinit, creating vector onto the device, etc. is taking place outside the kernel and is correct…I can print correct values too. I tried taking the return value in the __device __ variable too but no success.

Can someone please help?



My understanding is all cublas functions need to be called from the host. Internally they use the GPU. I don’t think they can be called from within a kernel.

Oh thanks!! Do you know if CUBLAS functions are blocking or non-blocking?