What libraries for linear solvers on CUDA can you recommend?

1553628633 · November 14, 2022, 4:58pm

Hello everyone, I am writing a finite element program with cuda and need a linear solver for solving large sparse matrices. I am currently writing a CG solver with cusparse. But I want faster speed, so is there any 3rd party library that can provide better solvers？

fbusato · November 14, 2022, 6:52pm

Hi, did you already take a look at the CG sample based on cuSPARSE APIs?
can you please provide another details about the requirements? why the performance is not enough?

1553628633 · November 15, 2022, 6:02am

Thank you for your reply！Yes, I have seen it, I am using a Incomplete-Cholesky preconditioned CG method now, but it still takes up most of the time in the whole program, so I want to optimize it. Sorry, I don’t know much about solvers, but I know there are some other methods like AMG, so is there any other solver library for cuda. Allows me to test the speed of different methods?

1553628633 · November 15, 2022, 7:36am

I compared with the CG method using Eigen library on CPU and found that the speedup is only twice

srodriguezbe · November 15, 2022, 7:48pm

The CG sample is not meant to be a high performance implementation of the conjugate gradient algorithm. Depending on the inputs, CG can be memory-bound or latency bound. For example, if your inputs are not too big, its runtime will be bounded by kernel launch overheads. A high performance implementation will avoid launching unnecessary kernels by fusing ops, avoiding checking the residual frequently, etc.

There is also the algorithmic design choices. There are many different variations of CG and some of them can converge faster or avoid the direct computations of dot products (performance limiting). Using the right preconditioner for your application is key as it can greatly improve the convergence of the solver. Does Eigen uses the same preconditioner as the CG sample?

I wouldn’t be surprised if a well optimized library is much faster than the CG sample.

That being said, it is always useful to profile your application using nsys-systems / nvprof and see where your CG spends most of the time. Is it on the computation of the preconditioner? Also, what about the number of iterations to convergence, is it about the same for Eigen and the CG sample?

Sparse iterative solvers can be quite complex and it is usually a good decision to rely on well stablished libraries (Eigen, Trilinos, etc) unless you have specific knowledge you can leverage in your solver. If you’re looking for a solution for NVIDIA GPUs you should give a try to AMGx too!

1553628633 · November 16, 2022, 6:43am

Thanks for your answer, it’s very professional. I’ll try to optimize my CG. I also found AMGX when searching yesterday, but it doesn’t seem to be particularly easy to port to my program, and I’m considering whether to take the time to try it. . .

Topic		Replies	Views
Linear Algebra Solvers CUDA Programming and Performance	20	19170	February 7, 2009
Banded sparse matrix linear eqution solve with CUDA GPU-Accelerated Libraries	0	2236	January 16, 2013
cuSPARSE to solve multiple independent sparse linear systems in parallel GPU-Accelerated Libraries	4	2201	March 3, 2014
Solve eigen GPU-Accelerated Libraries	4	3603	July 27, 2014
Accelerated Solution of Sparse Linear Systems Technical Blog	1	370	December 22, 2013
Accelerate Cholesky function in cuSolver. GPU-Accelerated Libraries	0	389	June 18, 2019
cuSPARSE for solving Ax=b on matrix ~ 230400x230400 GPU-Accelerated Libraries	3	3721	December 31, 2015
Bad performance using CUSP conjugate gradient... GPU-Accelerated Libraries	4	1724	July 24, 2019
PCG Solver for Dense Matrix? CUDA Programming and Performance	2	929	April 21, 2013
Cusparse cholesky & structural zeros - preconditioned conjugate gradient GPU-Accelerated Libraries cuda	3	1052	March 15, 2021

What libraries for linear solvers on CUDA can you recommend?

Related topics