I am calling cublasSdot inside the kernel but the code just hangs. Everything else like cublasinit, creating vector onto the device, etc. is taking place outside the kernel and is correct…I can print correct values too. I tried taking the return value in the __device __ variable too but no success.
Can someone please help?
Thanks,
Aditi