CUBLAS library and kernel

Is that a possibility to call the CUBLAS library for doing matrix operations during invoking a kernel?

I want to parallelise some work by invoking a kernel but in each threads I would need to solve some matrix operations as well. Cant workout how to make it in a proper way in CUDA.

No, function in CUBLAS is a C-wrapper, you cannot call it inside a kernel.

If you are asking whether you can call CUBLAS from inside a kernel, then the answer is no.

It might sound restrictive, but it doesn’t necessarily have to be. My approach to this has been to use the host to coordinate the main stages of a given algorithm, with each stage being a separate kernel or CUBLAS call, while keeping as much data on the GPU as possible to minimize PCI-e bus overheads. If your algorithm needs intermediate data from the device inside the host loop, you can do things like queue kernels, overlap copying with kernel execution, use zero copy memory access (if your GPU supports it) and perform host side operations in parallel with the GPU to help hide PCI-e bus latency and bandwidth limitations.

There are a couple of good papers (one on Lapack style dense matrix factorization by V.Volkov from UC Berkley and one from M.Fatica from NVIDIA on LINPACK benchmark acceleration) which show how effective this process can be.

I am trying to do lots of least squares estimates of the regression parameters in parallel. In this case if I make use of CUBLAS for the estimation, then I could only do the estimations one by one. Any thought?

It isn’t going to be easy to do that sort of task parallelism with CUDA, it really is a data parallel computing model. What are the problem sizes?

As an aside, I would imagine you could formulate and solve many least squares problems as a single large system of linear equations, but it would probably be much better done using sparse matrices with an iterative solver of some sort, rather than an explicit factorization with dense matrices. Which means you probably don’t want use CUBLAS in the first place…