cuSPARSE Incomplete LU Factorization (level 0)

michael.burnett.0017 · March 5, 2024, 5:55pm

What is the alternate to cuSPARSE Incomplete LU Factorization (level 0) functions, since they are marked as depreciated in CUDA 12 documentation?

I implemented my own CSR formatted CPU incomplete LU factorization and it’s very slow. I then made a dense iLU(0) on the CPU and it’s still very slow.

qanhpham · March 5, 2024, 6:12pm

Hi. At the moment we don’t have an alternative for Incomplete LU Factorization.

qanhpham · March 5, 2024, 8:05pm

Btw, would you mind sharing with us your use case for incomplete LU factorization? How much do you use this functionality in your computation?
This information will help us on developing the feature in next releases; so we’d appreciate if you could give us some details. Thanks.

michael.burnett.0017 · March 5, 2024, 11:34pm

My application is simulating transient incompressible fluid flow, and I use the GPU to solve the pressure equation and diffusion term of the momentum equation. I use an incomplete L/U to precondition a BiCGStab solver for the pressure and diffusion system of equations.

qanhpham · March 5, 2024, 11:51pm

Thanks for the detailed answer! Is there any other place that you can/want to utilize cuSparse for your computation?

michael.burnett.0017 · March 6, 2024, 4:56am

Ideally the Incomplete LU factorization function would allow for varying sparsity. This way the fill or “completeness” of the preconditioner can be adjusted by the programmer in order to find the best sparsity for quickest convergence of an iterative solver for a particular problem.

fbusato · March 7, 2024, 5:50pm

Thanks for the details. We haven’t made a final decision on how to support this functionality in CUDA 13 timeframe. Sometimes we mark features as deprecated to get feedback from the community, and your comments are valuable.

caplanr · April 16, 2024, 4:26pm

Hi,

Please do not remove ILU0 from cuSparse.

We currently use it in our POT3D code (GitHub - predsci/POT3D: POT3D: High Performance Potential Field Solver) and plan to add it to our MAS code (PSI MAS Model Description).

While for POT3D, using ILU0 on the CPU is fine since it is a single solve, for MAS it will be critical to keep all the data on the GPU, as there are several PCG solves per time step.

Please refer to our GTC talk: “Speeding Up a Banded-matrix Solver by 3x using the Updated cuSparse Library” Speeding Up a Banded-matrix Solver by 3x using the Updated cuSparse Library | NVIDIA On-Demand for details on how we are using the ILU0 feature.

Also see our GTC talk " Simulating Solar Eruptions on GPUs using Fortran Standard Parallelism" Simulating Solar Eruptions on GPUs using Fortran Standard Parallelism | NVIDIA On-Demand for more information about our MAS GPU implementations. In that talk, you can see how the ILU0 algorithm is faster than our simple preconditioner, so adding cuSparse will greatly speed up the code.

– Ron

Coercion · July 28, 2024, 7:26pm

Hello,

I agree with the other users that ILU(0) is the backbone of the iterative solvers and I’m using it constantly for the geophysical problems. This is also true for the ichol function.

Instead of removing it, CUDA should also support features such as ILU with a threshold. As a user, I would like to be able to use such a function too.

Deniz

fbusato · July 29, 2024, 4:43pm

that’s a great suggestion. We will keep it in mind

417luke318 · August 11, 2024, 10:19am

Totally agree.

andrea_nes · February 21, 2025, 11:19am

We are a startup and we develop software for simulating in real-time a number of different physics for a number of different surgical procedures - the simulations are used for computerized guidance of surgical procedures. Our simulation libraries have been licensed to multiple partners and will be used soon in several hundreds clinical centers world-wide. ILU is used as a pre-conditioner in key routines common to all these libraries.

We expect CUDA libraries to grow in functionality, or stay the same, not shrink, as this really puts us in an uncomfortable position.

Best Regards,

Andrea

eedwards · February 24, 2025, 9:59pm

Many items in the documentation are marked deprecated with a notice that they will be removed in the next major release. The next major release is 13.0. In fact, we will not be removing almost anything in 13.0. Incomplete LU and ICHOL are safe and will not be removed until there is a substitute.

Topic		Replies	Views
cusparse Incomplete Cholesky CG - incorrect results GPU-Accelerated Libraries	9	3834	June 16, 2013
Incomplete-LU and Cholesky Preconditioned CUDA Programming and Performance	4	5358	November 8, 2012
cuSparse incomplete LU decomposition as preconditioner GPU-Accelerated Libraries	9	2959	September 9, 2016
Parallel preconditioning for CG algorithm ILU(0) CUDA Programming and Performance	1	2043	June 14, 2011
Separating L and U easily from cusparse<t>csrilu0 GPU-Accelerated Libraries	1	1420	January 8, 2015
CUSPARSE preconditioner alternative GPU-Accelerated Libraries cuda , cusparse	1	477	October 16, 2023
CULA Sparse 1.0 now available Sparse Iterative Solver Package CUDA Programming and Performance	3	2255	November 7, 2011
Parallel Preconditioners for CG calculating the "inverse" in parallel CUDA Programming and Performance	2	3926	April 7, 2010
CuSPARSE: Solver with LU decomposition returns wrong answer, maybe fill-in problem? GPU-Accelerated Libraries	3	2440	August 6, 2014
When will cuSolver move the sparse host-only functions to device implementations? GPU-Accelerated Libraries	0	635	February 14, 2016

cuSPARSE Incomplete LU Factorization (level 0)

Related topics