cuSPARSE <= 11.7 cusparseDcsrsv2 buffer lifetime

Hi,

I know that 11.7 is an old version, but the old cuSPARSE works much better for me, especially SpSM, which is much faster.

Anyway, I have a question about how the “temporary workspace buffer” works there. Specifically the cusparseDcsrsv2_analysis + solve and cusparseDcsrsm2_analysis + solve functions.

The documentation says

This function requires a buffer size returned by csrsm2_bufferSize(). …

How does that buffer work? Can I allocate it just before the function and deallocate it right after (after a synchronization of course)?

It does not say I cannot modify the buffer between the analysis and solve phases, as the generic SpSM does.

So, in the old csrsm2, can I modify the buffer? Can I pass completely different buffers to the analysis and solve phases? Or does the analysis store something in the buffer which the solve uses?

I tested it, and when I zero-out the buffer between analysis and solve, it still works and produces correct results. But is that guaranteed?

And aside, the docs also say with the analysis functions, that “This function requires temporary extra storage that is allocated internally”. Why is it not a part of the buffer?

Thanks for help,

Jakub

Hi @jhomola,
Let me start by addressing your questions about the old csrsm2.

  • The buffer acts as a local workspace for both the analysis and solve routines, meaning the buffers for these routines are independent. You can adjust or provide a different buffer for solve.

  • The analysis routine allocates additional storage internally for the csrsm2Info_t data structure, which stores the analysis data passed to solve. The buffer itself is just temporary and is not used for this purpose. (Note: The new cusparseSpSM API does not allocate any memory internally and only uses the buffer provided by the user).

Regarding your observation that the old SpSM is faster than the new one, could you please share the specific use case and the matrix/matrices you’re using?

Thanks

Hi,

thanks for the reply. Having the option to use a different buffer really helps me, good to know that I can really do that. Is there an estimate on how much memory is needed for the csrsm2Info_t structure?

Regarding your observation that the old SpSM is faster than the new one, could you please share the specific use case and the matrix/matrices you’re using?

I have been meaning to report this for a while now, never had the time to do it. Now is the time, I guess. I will get back to you later, when I create a simple reproducer.

But basically, even in the same version 11.7, the legacy csrsm2 is faster than the generic SpSM for my use case, and the difference is very significant. Also, the buffer needed by SpSM has to stay untouched based on the documetation, and combined with the fact that the buffersize is quite large (approx. matrix+rhs, independent from opA/opB/orderB from my observations, the rhs size is my main issue), this is not good for me.

In the non-transpose case, it is in the order of the number of rows. For transpose, it is in the order of the matrix size.

A reproducer or just the sparse matrices can be great. What is the sparsity of the matrix in your application? What is your application and use case?

Thanks

Hi @malmasri

so I extracted the matrices and created the reproducer. I put it in a new topic, since this one was mainly about the buffer lifetime.

See cuSPARSE generic SpSM much slower than legacy csrsm2

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.