cuSPARSE <= 11.7 cusparseDcsrsv2 buffer lifetime

jhomola · October 3, 2024, 4:01pm

Hi,

I know that 11.7 is an old version, but the old cuSPARSE works much better for me, especially SpSM, which is much faster.

Anyway, I have a question about how the “temporary workspace buffer” works there. Specifically the cusparseDcsrsv2_analysis + solve and cusparseDcsrsm2_analysis + solve functions.

The documentation says

This function requires a buffer size returned by csrsm2_bufferSize(). …

How does that buffer work? Can I allocate it just before the function and deallocate it right after (after a synchronization of course)?

It does not say I cannot modify the buffer between the analysis and solve phases, as the generic SpSM does.

So, in the old csrsm2, can I modify the buffer? Can I pass completely different buffers to the analysis and solve phases? Or does the analysis store something in the buffer which the solve uses?

I tested it, and when I zero-out the buffer between analysis and solve, it still works and produces correct results. But is that guaranteed?

And aside, the docs also say with the analysis functions, that “This function requires temporary extra storage that is allocated internally”. Why is it not a part of the buffer?

Thanks for help,

Jakub

malmasri · October 3, 2024, 8:32pm

Hi @jhomola,
Let me start by addressing your questions about the old csrsm2.

The buffer acts as a local workspace for both the analysis and solve routines, meaning the buffers for these routines are independent. You can adjust or provide a different buffer for solve.
The analysis routine allocates additional storage internally for the csrsm2Info_t data structure, which stores the analysis data passed to solve. The buffer itself is just temporary and is not used for this purpose. (Note: The new cusparseSpSM API does not allocate any memory internally and only uses the buffer provided by the user).

Regarding your observation that the old SpSM is faster than the new one, could you please share the specific use case and the matrix/matrices you’re using?

Thanks

jhomola · October 3, 2024, 10:27pm

Hi,

thanks for the reply. Having the option to use a different buffer really helps me, good to know that I can really do that. Is there an estimate on how much memory is needed for the csrsm2Info_t structure?

Regarding your observation that the old SpSM is faster than the new one, could you please share the specific use case and the matrix/matrices you’re using?

I have been meaning to report this for a while now, never had the time to do it. Now is the time, I guess. I will get back to you later, when I create a simple reproducer.

But basically, even in the same version 11.7, the legacy csrsm2 is faster than the generic SpSM for my use case, and the difference is very significant. Also, the buffer needed by SpSM has to stay untouched based on the documetation, and combined with the fact that the buffersize is quite large (approx. matrix+rhs, independent from opA/opB/orderB from my observations, the rhs size is my main issue), this is not good for me.

malmasri · October 4, 2024, 3:20pm

In the non-transpose case, it is in the order of the number of rows. For transpose, it is in the order of the matrix size.

A reproducer or just the sparse matrices can be great. What is the sparsity of the matrix in your application? What is your application and use case?

Thanks

jhomola · October 17, 2024, 1:45pm

Hi @malmasri

so I extracted the matrices and created the reproducer. I put it in a new topic, since this one was mainly about the buffer lifetime.

See cuSPARSE generic SpSM much slower than legacy csrsm2

system · October 31, 2024, 1:46pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Difference between CuSparse csrsv* and csrsv2* GPU-Accelerated Libraries	4	2528	June 14, 2014
cusparseScsrsv_analysis: any limitation of the metric size? GPU-Accelerated Libraries	3	1747	February 21, 2013
cusparseCsr2cscEx2_buffersize huge buffersize GPU-Accelerated Libraries	6	1050	October 12, 2021
performance of cusparseDcsrsv_analysis CUDA Programming and Performance	3	1337	July 1, 2012
Sparse Matrix-Vector Multiplication on CUDA CUDA Programming and Performance	79	314035	November 22, 2010
Very slow performance of cusparseDcsrsv_analysis Iterative methods CUDA Programming and Performance	0	850	July 1, 2012
CUSPARSE conversion routines not working... cusparseSnnz and cusparseSdense2csr misbehaving... CUDA Programming and Performance	11	4246	February 28, 2011
Anyone has performance exeprience with CSRSV in CUSPARSE? CUDA Programming and Performance	1	6266	October 13, 2011
Memory requirement of cusparseScsr2csc CUDA Programming and Performance	1	5744	March 1, 2011
Cusp v0.1 release (Sparse Matrix Library) Cusp is a high-level library for sparse linear algebra and CUDA Programming and Performance	0	1513	May 4, 2010

cuSPARSE <= 11.7 cusparseDcsrsv2 buffer lifetime

Related topics