Hello,
Recently, I tried cuDSS’s MG mode, and I was able to make it work for a sparse linear system (LU factorization, complex numbers) using up to 8 GPUs. I saw a reduction in factorization times and solution times. I also observed that the factors are distributed to the GPUs, which is also very good news.
My question would be this: I’m not sure if this is related to cuSPARSE or cuDSS. If the sparse matrix factorization and triangular matrix solution can be parallelized across GPUs (to a certain point), would it be possible to do it for ILU(0) decomposition, too? It is for the iterative solution of the system, and the sparse triangular matrix solution part is the most critical section, and it is not easy to implement straightforwardly. Any plans to implement preconditioner-type solvers too in the future?
(It could be useful when a linear system gets too large and it cannot fit into a single GPU’s memory)
Regards
Deniz