cuSPARSE Incomplete LU Factorization (level 0)

What is the alternate to cuSPARSE Incomplete LU Factorization (level 0) functions, since they are marked as depreciated in CUDA 12 documentation?

I implemented my own CSR formatted CPU incomplete LU factorization and it’s very slow. I then made a dense iLU(0) on the CPU and it’s still very slow.

Hi. At the moment we don’t have an alternative for Incomplete LU Factorization.

Btw, would you mind sharing with us your use case for incomplete LU factorization? How much do you use this functionality in your computation?
This information will help us on developing the feature in next releases; so we’d appreciate if you could give us some details. Thanks.

My application is simulating transient incompressible fluid flow, and I use the GPU to solve the pressure equation and diffusion term of the momentum equation. I use an incomplete L/U to precondition a BiCGStab solver for the pressure and diffusion system of equations.


Thanks for the detailed answer! Is there any other place that you can/want to utilize cuSparse for your computation?

Ideally the Incomplete LU factorization function would allow for varying sparsity. This way the fill or “completeness” of the preconditioner can be adjusted by the programmer in order to find the best sparsity for quickest convergence of an iterative solver for a particular problem.

Thanks for the details. We haven’t made a final decision on how to support this functionality in CUDA 13 timeframe. Sometimes we mark features as deprecated to get feedback from the community, and your comments are valuable.


Please do not remove ILU0 from cuSparse.

We currently use it in our POT3D code (GitHub - predsci/POT3D: POT3D: High Performance Potential Field Solver) and plan to add it to our MAS code (PSI MAS Model Description).

While for POT3D, using ILU0 on the CPU is fine since it is a single solve, for MAS it will be critical to keep all the data on the GPU, as there are several PCG solves per time step.

Please refer to our GTC talk: “Speeding Up a Banded-matrix Solver by 3x using the Updated cuSparse Library” Speeding Up a Banded-matrix Solver by 3x using the Updated cuSparse Library | NVIDIA On-Demand for details on how we are using the ILU0 feature.

Also see our GTC talk " Simulating Solar Eruptions on GPUs using Fortran Standard Parallelism" Simulating Solar Eruptions on GPUs using Fortran Standard Parallelism | NVIDIA On-Demand for more information about our MAS GPU implementations. In that talk, you can see how the ILU0 algorithm is faster than our simple preconditioner, so adding cuSparse will greatly speed up the code.

– Ron

1 Like