What is the alternate to cuSPARSE Incomplete LU Factorization (level 0) functions, since they are marked as depreciated in CUDA 12 documentation?
I implemented my own CSR formatted CPU incomplete LU factorization and it’s very slow. I then made a dense iLU(0) on the CPU and it’s still very slow.
Hi. At the moment we don’t have an alternative for Incomplete LU Factorization.
Btw, would you mind sharing with us your use case for incomplete LU factorization? How much do you use this functionality in your computation?
This information will help us on developing the feature in next releases; so we’d appreciate if you could give us some details. Thanks.
My application is simulating transient incompressible fluid flow, and I use the GPU to solve the pressure equation and diffusion term of the momentum equation. I use an incomplete L/U to precondition a BiCGStab solver for the pressure and diffusion system of equations.
2 Likes
Thanks for the detailed answer! Is there any other place that you can/want to utilize cuSparse for your computation?
Ideally the Incomplete LU factorization function would allow for varying sparsity. This way the fill or “completeness” of the preconditioner can be adjusted by the programmer in order to find the best sparsity for quickest convergence of an iterative solver for a particular problem.
Thanks for the details. We haven’t made a final decision on how to support this functionality in CUDA 13 timeframe. Sometimes we mark features as deprecated to get feedback from the community, and your comments are valuable.
Hi,
Please do not remove ILU0 from cuSparse.
We currently use it in our POT3D code (GitHub - predsci/POT3D: POT3D: High Performance Potential Field Solver) and plan to add it to our MAS code (PSI MAS Model Description).
While for POT3D, using ILU0 on the CPU is fine since it is a single solve, for MAS it will be critical to keep all the data on the GPU, as there are several PCG solves per time step.
Please refer to our GTC talk: “Speeding Up a Banded-matrix Solver by 3x using the Updated cuSparse Library” Speeding Up a Banded-matrix Solver by 3x using the Updated cuSparse Library | NVIDIA On-Demand for details on how we are using the ILU0 feature.
Also see our GTC talk " Simulating Solar Eruptions on GPUs using Fortran Standard Parallelism" Simulating Solar Eruptions on GPUs using Fortran Standard Parallelism | NVIDIA On-Demand for more information about our MAS GPU implementations. In that talk, you can see how the ILU0 algorithm is faster than our simple preconditioner, so adding cuSparse will greatly speed up the code.
– Ron
1 Like
Hello,
I agree with the other users that ILU(0) is the backbone of the iterative solvers and I’m using it constantly for the geophysical problems. This is also true for the ichol function.
Instead of removing it, CUDA should also support features such as ILU with a threshold. As a user, I would like to be able to use such a function too.
Deniz
1 Like
that’s a great suggestion. We will keep it in mind
We are a startup and we develop software for simulating in real-time a number of different physics for a number of different surgical procedures - the simulations are used for computerized guidance of surgical procedures. Our simulation libraries have been licensed to multiple partners and will be used soon in several hundreds clinical centers world-wide. ILU is used as a pre-conditioner in key routines common to all these libraries.
We expect CUDA libraries to grow in functionality, or stay the same, not shrink, as this really puts us in an uncomfortable position.
Best Regards,
Andrea
Many items in the documentation are marked deprecated with a notice that they will be removed in the next major release. The next major release is 13.0. In fact, we will not be removing almost anything in 13.0. Incomplete LU and ICHOL are safe and will not be removed until there is a substitute.
1 Like