cuDSS , MG mode and ILU(0)

Coercion · November 11, 2025, 3:16pm

Hello,

Recently, I tried cuDSS’s MG mode, and I was able to make it work for a sparse linear system (LU factorization, complex numbers) using up to 8 GPUs. I saw a reduction in factorization times and solution times. I also observed that the factors are distributed to the GPUs, which is also very good news.

My question would be this: I’m not sure if this is related to cuSPARSE or cuDSS. If the sparse matrix factorization and triangular matrix solution can be parallelized across GPUs (to a certain point), would it be possible to do it for ILU(0) decomposition, too? It is for the iterative solution of the system, and the sparse triangular matrix solution part is the most critical section, and it is not easy to implement straightforwardly. Any plans to implement preconditioner-type solvers too in the future?

(It could be useful when a linear system gets too large and it cannot fit into a single GPU’s memory)

Regards

Deniz

ariahi · November 13, 2025, 2:11am

Hi Deniz, Thanks so much for the feedback. Have you looked into AmgX? This is a open source library of iterative solvers and pre-conditioners.

Coercion · November 15, 2025, 12:05pm

Hi Ariahi,

I have been aware of the AmgX package. I was able to compile it for Windows too, but I haven’t used it. I have been using cublas and cusparse libraries to build iterative solvers. I’m quite happy with them. I have never liked the algebraic multigrid approaches for some reason (geometric version too)

At the same time, I think I’ve found an answer to my question (a workaround). After ILU(0) decomposing a preconditioner matrix, those triangular matrices can be multiplied with each other, and this new matrix can be passed to cuDSS for factorization and solution using the MG mode. This could be a solution for now.

Cheers

Deniz

Topic		Replies	Views
Parallel preconditioning for CG algorithm ILU(0) CUDA Programming and Performance	1	2061	June 14, 2011
Parallel Preconditioners for CG calculating the "inverse" in parallel CUDA Programming and Performance	2	3944	April 7, 2010
Linear Algebra Solvers CUDA Programming and Performance	20	19327	February 7, 2009
cuSPARSE for solving Ax=b on matrix ~ 230400x230400 GPU-Accelerated Libraries	3	3800	December 31, 2015
cuSparse incomplete LU decomposition as preconditioner GPU-Accelerated Libraries	9	3004	September 9, 2016
cuSPARSE to solve multiple independent sparse linear systems in parallel GPU-Accelerated Libraries	4	2269	March 3, 2014
Separating L and U easily from cusparse<t>csrilu0 GPU-Accelerated Libraries	1	1440	January 8, 2015
Cusp v0.1 release (Sparse Matrix Library) Cusp is a high-level library for sparse linear algebra and CUDA Programming and Performance	0	1513	May 4, 2010
CULA Sparse (beta) taking applications gpu-accelerated sparse linear algebra CUDA Programming and Performance	1	2315	August 26, 2011
CULA Sparse (beta) taking applications gpu-accelerated sparse linear algebra CUDA Programming and Performance	1	736	August 26, 2011

cuDSS , MG mode and ILU(0)

Related topics