What libraries for linear solvers on CUDA can you recommend?

Hello everyone, I am writing a finite element program with cuda and need a linear solver for solving large sparse matrices. I am currently writing a CG solver with cusparse. But I want faster speed, so is there any 3rd party library that can provide better solvers?

Hi, did you already take a look at the CG sample based on cuSPARSE APIs?
can you please provide another details about the requirements? why the performance is not enough?

Thank you for your reply!Yes, I have seen it, I am using a Incomplete-Cholesky preconditioned CG method now, but it still takes up most of the time in the whole program, so I want to optimize it. Sorry, I don’t know much about solvers, but I know there are some other methods like AMG, so is there any other solver library for cuda. Allows me to test the speed of different methods?

I compared with the CG method using Eigen library on CPU and found that the speedup is only twice

The CG sample is not meant to be a high performance implementation of the conjugate gradient algorithm. Depending on the inputs, CG can be memory-bound or latency bound. For example, if your inputs are not too big, its runtime will be bounded by kernel launch overheads. A high performance implementation will avoid launching unnecessary kernels by fusing ops, avoiding checking the residual frequently, etc.

There is also the algorithmic design choices. There are many different variations of CG and some of them can converge faster or avoid the direct computations of dot products (performance limiting). Using the right preconditioner for your application is key as it can greatly improve the convergence of the solver. Does Eigen uses the same preconditioner as the CG sample?

I wouldn’t be surprised if a well optimized library is much faster than the CG sample.

That being said, it is always useful to profile your application using nsys-systems / nvprof and see where your CG spends most of the time. Is it on the computation of the preconditioner? Also, what about the number of iterations to convergence, is it about the same for Eigen and the CG sample?

Sparse iterative solvers can be quite complex and it is usually a good decision to rely on well stablished libraries (Eigen, Trilinos, etc) unless you have specific knowledge you can leverage in your solver. If you’re looking for a solution for NVIDIA GPUs you should give a try to AMGx too!


Thanks for your answer, it’s very professional. I’ll try to optimize my CG. I also found AMGX when searching yesterday, but it doesn’t seem to be particularly easy to port to my program, and I’m considering whether to take the time to try it. . .