Double buffer requirement for SpSV and SpSM operations

Coercion · December 12, 2025, 8:03am

Hi,

I have a basic preconditioned conjugate gradient (CG) routine that solves the Ax=b linear system with a preconditioner matrix M. The preconditioner matrix is split into two triangular matrices by ILU(0). Afterwards, in each CG iteration, those triangular matrices are applied to a vector via SpSV or SpSM functions.

My question, or in better words, my feature request is this: during the analyses phase of SpSV, I have to allocate two buffers, both for the backward sweep and for the forward sweep. I’ve realized I cannot use a single buffer for both operations. I think I’m right on this one.

The problem is this: for small matrices, the buffer size can be considered negligible, but I want to give you some numbers. I have a matrix A in CSR format that consumes around 4.5GB of VRAM; at the same time, I have a preconditioner matrix that consumes ~2GB of VRAM. I realized that those two buffers allocate 2.4GB of memory for each of them. That means 4.8GB of memory is just allocated for sparse triangular solutions. That is a lot of memory, and I think I cannot use a single buffer and save at least 2.4GB of memory.

In short, I wish I could use a single buffer for those sparse operations and save some memory space in the future.

Regards

Deniz

malmasri · December 15, 2025, 3:42pm

Hi @Coercion ,

Yes, SpSV and SpSM require separate buffers for the forward and backward sweeps, and these buffers cannot be overlapped. We will keep this request in mind for future consideration. You mentioned using CG; could you please share more details about the overall application?

Thanks,

Mohammad

Coercion · December 15, 2025, 6:41pm

Hi again,

I’m a geophysicist who models electromagnetic waves. So I use Maxwell equations in the frequency domain and solve a linear system Ax=b, where A is a complex symmetric but non-Hermitian matrix. I said CG, but I coded my own solver, and I’m officially using the GPBiCG algorithm.

The forward problem is coupled with an inverse problem, but I’ve already coded all this stuff up using CUDA-C, so I’m all good. On the other hand, the immense memory requirement of the triangular matrix solutions surprised me, even if those triangular matrices are created using ILU(0). Maybe it wasn’t like this in CUDA 12 or 11; I didn’t check it.

Regards

Deniz

Topic		Replies	Views
cusparseSpSV_solve function extremely slow GPU-Accelerated Libraries	4	156	November 19, 2024
Sparse triangular matrix solvers in MKL and cuSparse produce different results? GPU-Accelerated Libraries cusparse	5	753	July 24, 2023
cuSPARSE <= 11.7 cusparseDcsrsv2 buffer lifetime GPU-Accelerated Libraries cusparse	5	107	October 17, 2024
cusparseSpMV: buffer size and vector descriptions GPU-Accelerated Libraries	0	754	March 20, 2020
Repetitive calls to cusparseSpGEMM GPU-Accelerated Libraries cusparse	3	775	May 11, 2023
Preconditioning with cusparseSpSV CSR GPU-Accelerated Libraries cusparse	10	1665	October 12, 2021
cusparseCsr2cscEx2_buffersize huge buffersize GPU-Accelerated Libraries	6	1073	October 12, 2021
Sparse Matrix-Vector Multiplication on CUDA CUDA Programming and Performance	79	314196	November 22, 2010
conjugate gradient CUDA Programming and Performance	16	10077	June 18, 2008
Some questions for the function "cusparseDcsrsv2_solve" nvc, nvc++ and nvfortran cuda	12	166	August 9, 2024

Double buffer requirement for SpSV and SpSM operations

Related topics