Sparse cusolver inside loop .................. factorization at every call?

kalma111 · March 13, 2019, 1:20am

Hi,

I am wondering whether there is any cusolver which can be used as a replacement for intel mkl pradiso.

I am dealing with the problem Ax=b, where “A” is sparse, symmetric and positive definite, and x and b are vectors which can hold multiple righthand sides/solutions. “A” is constant throughout the program but “Ax=b” is called in different parts of the program with different “x”'s and “b”'s.

So far I have used pardio which has a preparation, factorization and solving step, which can be called individually. Thus, preparation and the most expensive factorization can be called once at the beginning of the program, and the solver calls can be repeated anywhere throughout the program without much cost because the factors are reused.

With regard to cuda I have looked at “cusolverSpScsrlsvlu”, “cusolverSpScsrlsvqr” and “cusolverSpScsrlsvchol” and it appears to me that these function will do (redo) the factorization each time they are called which will produce a massive overhead. Is that correct?? And if so, is there any way to circumvent it??

Thanks

Robert_Crovella · March 13, 2019, 1:48am

Take a look at the sample codes cuSolverSp_LowlevelQR and cuSolverSp_LowlevelCholesky

Using the first one as an example, the whole solution process is first done using the Host API in steps up through 8.

After that the process is repeated using the device API in steps 9-14. AFAIK steps 9-13 can be done once, and repeat step 14 as many times as needed for different RHS if the A matrix doesn’t change.

kalma111 · March 13, 2019, 2:19am

Hi, thanks for the quick response. I had a look at the example, and there is a function which I struggle to find in the cuda online documentation : “cusolverSpDcsrcholFactorHost”. I looked here: [url]https://docs.nvidia.com/cuda/cusolver/index.html#cusolver-low-level-function-reference[/url] but I could not find anything. Of course after knowing what to look for I found file “cusolverSP_LOWLEVEL_PREVIEW.h”. So what am I doing wrong when not finding these functionality described in the online documentation (which is the reason for the thread)??

Thanks

Robert_Crovella · March 13, 2019, 2:39am

Because they (various cuSolverSp functions used in those sample codes) are not in the online documentation, AFAIK.

The fact that these functions are in a special header file with PREVIEW in the name seems to be significant.

I have an internal inquiry about it, but the earliest the documentation could change would be the next CUDA release (and I have no guarantees about it changing then).

kalma111 · March 13, 2019, 2:41am

Thanks a lot.

kalma111 · March 16, 2019, 5:52am

Hi Robert,

so I got all that to work, but I noticed that there is no reordering step before the call “cusolverSpDcsrcholfactor”. According to “cusolverSpDcsrcholBufferInfo” the matrix I am using requires 6,581,070,336 bytes of memory for factorization. Using the same matrix in MKL pardiso, pardiso reports a peak memory usage of 3.87 giga bytes, almost half of what cuda wants. This is presumably due to cuda not reordering the matrix to reduce fillins. Is there any option (which I may have overlooked) to induce a reordering before the factorization call??

Futher, is there any option to retrieve the factor from the device once the factorization is done. For a smaller example matrix where everything went successfully I tried to copy values from the location of “*pbuffer” but what I got was only zeros. My guess is that cuda is not “giving away” the factor (similar to MKL), but I might be wrong.

Finally, is there any option for multiple right-hand side (rhs) in a single call of “cusolverSpDcsrcholSolve”. My experience from MKL is that it takes much more time to call pardiso in a loop n times for n rhs compared to calling it once and deliver all rhs at once.

Thanks.

liuxiang1129 · March 6, 2024, 3:10pm

Hi, Robert! Firstly, thanks for reading my message. I have sufferred from the similar question like others when solving a linear equation systems with the cuda library function. I used the function in cusolver library, or “cusolverSpDcsrlsvqr” in the sample “cuSolverSp_LinearSolver”, to solve a equations system whose coefficient matrix is a 50,000 x50,000 sparse matrix). However, I find the cusolver takes much longer time (68 sec per step) than that pardiso do in Fortran (2.3 sec per step). As is known that the gpu parrallelism is more efficient than the cpu parrallelism, could you provide some help with me?Thank you so much for your patience.

asundaram · March 6, 2024, 5:00pm

Hi

Please check out cuDSS, direct sparse solver for NVIDIA GPUs

liuxiang1129 · March 9, 2024, 1:31pm

Hi asundaram! Thanks so much for your previous suggestion about sovling a sparse linear equation. After installing the cudss, I find there are problems when I try that "[cuDSS Example 1] (CUDALibrarySamples/cuDSS/simple at master · NVIDIA/CUDALibrarySamples · GitHub) “. Firstly, I set the path for the cudss library and header documents according to " Installation and Compilation”, but the cudss library function can not be recognized. Secondly, after that I copy the cudss library and header documents into the path of “include” and “lib” in cuda 12.0, the cudss code can run. However, as is mentioned that the solution of [cuDSS Example 1] is {1,2,3,4,5}, I check the solution was {0.825, 1.53, 3.7, 4, 6.5} after using other tools, such as “cusolverSpDcsrlsvqr” in the sample “cuSolverSp_LinearSolver”. So, is there any problem during using the cudss sample and calling the cudss library function. Thank you very much for your patience and reading this messages.

Topic		Replies	Views
Example using cusparse and cusolverSpDcsrlsvchol GPU-Accelerated Libraries cusolver , cusparse	22	497	May 13, 2025
Cusolver solve sparse Ax=b wrong GPU-Accelerated Libraries cusolver	8	220	August 19, 2024
How to use cusparse's factorization methods? GPU-Accelerated Libraries cusparse	5	95	September 8, 2025
Cusparse for solving the sparse linear equation Ax=b Legacy PGI Compilers	8	2140	August 30, 2019
Accelerate Cholesky function in cuSolver. GPU-Accelerated Libraries	0	435	June 18, 2019
How to use the cusolverSP with multiple right hand side GPU-Accelerated Libraries	1	1170	September 7, 2015
Device version of cusolverSpScsrlsvqr is extremely slower than host version CUDA Programming and Performance cuda , performance	4	779	October 12, 2021
When will cuSolver move the sparse host-only functions to device implementations? GPU-Accelerated Libraries	0	650	February 14, 2016
Replace Intel MKL pardiso with PGI CUDA libraries Legacy PGI Compilers	1	1206	October 27, 2021
Question about cholmod gpu vs cudasolversp GPU-Accelerated Libraries	0	632	September 30, 2019

Sparse cusolver inside loop .................. factorization at every call?

Related topics