cuBLAS and dynamic parallelism

eyalhir74 · August 28, 2013, 1:10pm

Hi,
Does anyone know if the cuBLAS functions that are called inside kernels running on the device (as in Dynamic Parallelism) cause some sort of device synchronization?

What if I want to run couple of kernels concurrently on the same device using multiple streams, will the cuBLAS functions inside the kernel, cause a device sync, hence no concurrency be achieved??

Is there a way around this?

thanks

JFSebastian · August 29, 2013, 8:20am

With the exception of few routines (returning a scalar value or involging CPU<->GPU transactions), most of cuBLAS routines are asynchronous when called from the host. However, I have never used them called within a kernel, so I do not know if they do or do not keep their asynchronous behavior. Perhaps, if none is definitely answering your question, it is worth a try and profile by yourself your code by the Visual Profiler to see if you observe concurrency…

eyalhir74 · August 29, 2013, 10:17am

Thanks. Yes I’ve run it through the profiler. It seems that the device is synchronized from the profiler output. I wanted to be sure and see if there’s a work around to that and understand why it is so.

I’ve just took the cdpLUDecomposition sample and run it on two streams concurrently - that causes sync in the profiler.

eyalhir74 · August 30, 2013, 3:47am

The LU sample code contains a call to cublasIzamax. This function caused the synchronization
as it returns a value.
Replacing this function by a custom kernel to find the max solved the problem and the LU
can be run on multiple streams concurrently.

Topic		Replies	Views
Dynamic parallelism vs Streams CUDA Programming and Performance	1	620	October 8, 2014
Wow, Does CuBLAS need a rest to perform well? GPU-Accelerated Libraries	0	921	June 27, 2013
Asynchronous Parameter Passing how cuSPARSE/cuBLAS can do this? CUDA Programming and Performance	6	4050	March 1, 2012
Synchronization for CUBLAS CUDA Programming and Performance	0	1165	March 18, 2014
Calling a cuBLAS function from within a kernel GPU-Accelerated Libraries	11	4277	May 19, 2017
Kernel call by CUBLAS or CUSPARSE library CUDA Programming and Performance	15	3031	November 3, 2010
cuBLAS kernels always run serially despite streams and AsyncMemCpy?!? CUDA Programming and Performance	17	6013	September 30, 2015
How to synchronize CuSPARSE functions (as cusparseDcsrmv, ...) GPU-Accelerated Libraries	1	1410	June 5, 2014
CUBLAS ... CUDA Programming and Performance	0	471	May 21, 2013
cudnnCreate() / cublasCreate() blocked while CUDA kernels run in parallel (irrespective of process) GPU-Accelerated Libraries cudnn , cublas	3	2008	July 5, 2021

cuBLAS and dynamic parallelism

Related topics