Executing CUBLAS / CULA methods from within a Kernel

Hi all,

I was wondering if there is any way I can invoke CUBLAS / CULA methods from within a kernel.
I was trying to do so, and I got a compilation error, saying that I am trying to run a host function from the device.
However, for an application I want to write, I need to execute the same CULA method N times in parallel.
Must I do a loop and perform these CULA methods in a sequential order?

Shay M.

Cublas and cula are host side apis. They cannot be used in kernel code.

Cublas exposes the streams interface, so if you use streams and run on a Fermi card, there might be some possibility of running simultaneous kernels. But I don’t remember cula using streams, although it was some time ago the last time I tried it.