Using cuBLAS in different CUDA streams

Hi!

I want to call some function in different CUDA streams to increase GPU occupancy.
The function calls cublasDtrsv.
I’m confused, should I create unique cuBLAS handle per stream or not?

Thanks in advance.

It is required that different handles be used for different devices:

http://docs.nvidia.com/cuda/cublas/index.html#cublas-context

It is recommended (but not required, if care is taken) that different handles be used for different host threads:

http://docs.nvidia.com/cuda/cublas/index.html#thread-safety2

It is not required or recommended that different handles be used for different streams on the same device, using the same host thread.

Thank you for help.

The kernels generally have to be small in order to get good concurrency.

Here some of my code from the “Glass Brain” project which has examples of using streams for cuBLAS Sgemv_v2() (lines #166-170), and for cuSparse cusparseScsrmv_v2(lines #274-280);

https://github.com/OlegKonings/BCI_EEG_blk_diag_admm_multi_lambda/blob/master/GroupMextest/GroupMextest/GLmex.cpp

I only needed two handles in that case, one for cuBLAS and one for cuSPARSE.