cuSolver stream parallelism

Burajimiru · March 7, 2018, 6:55am

hi,

I am using cuSolver in a project to make an LU decomposition and reuse it many times.
I have several decompositions and would like to process them in parallel. For that, i am using different streams for each system.

for this simple test, I created 4 4x4 matrices, and each solve is applied to 3 vectors (x y z)

the bulk of the code is ommited, but the main loop is like this:

cusolverDnCreate(&hdl);
//....

for (auto j = 0; j < 100; j++) {
 for (auto i = 0; i < 4; i++)
 cuInitializeRHS<<<nBlocks,nThreads,0,stream[i]>>>();

 for (auto i = 0; i < 4; i++)
 {
 cusolverDnSetStream(hdl, stream[i]);
 cusolverDnSgetrs(hdl,CUBLAS_OP_N,m,3,A[i],m,NULL, b[i], m, NULL);
 }
 for (auto i = 0; i < 4; i++)
 getResult<<<nBlocks,nThreads,0,stream[i]>>>();
}

//....
cusolverDnDestroy(hdl);

this is the kind of behavior I am getting:

the first and last batch, are regular cuda kernels implemented here. they are concurrent, as expected.

between getrs calls, cusolver seems to be creating an event and checking if the computation has finished. since they are using different streams, i would expect the computations to be independent.

Could someone help me figure out why this is happening?
thank you

Topic		Replies	Views
Streaming cuSolver GPU-Accelerated Libraries	2	1610	June 9, 2015
cusparse concurrency using streams CUDA Programming and Performance	3	1708	July 19, 2013
No stream concurrency with cusolverDnDsyevj GPU-Accelerated Libraries	0	456	July 5, 2018
Concurrent kernels on Kepler CUDA Programming and Performance	8	1052	February 23, 2014
Parallel large SVD GPU-Accelerated Libraries cusolver	5	1062	October 20, 2023
Calling cuDSS functions from multiple CPU host threads GPU-Accelerated Libraries cudss	5	423	March 19, 2024
Error in cusolverMp syevd + hanging GPU-Accelerated Libraries cublas , cusolver	1	34	November 29, 2024
cuSolver handle GPU memory use GPU-Accelerated Libraries cublas , cusolver	3	1245	October 6, 2022
cuSPARSE to solve multiple independent sparse linear systems in parallel GPU-Accelerated Libraries	4	2169	March 3, 2014
running two CUDA processes CUDA Programming and Performance	1	891	June 2, 2015

cuSolver stream parallelism

Related topics