Anyone has performance exeprience with CSRSV in CUSPARSE?

I just saw that CSRSV is supported in CUSPARSE in the 4.0 version of CUDA (called [font=“Courier New”]cusparse{SDCZ}csrsv_analysis[/font] and [font=“Courier New”]cusparse{SDCZ}csrsv_solve[/font]). Anyone has experience on its performance behavior? OR is there any public report on this issue? Thanks.

It seems there are not many people interested in CSRSV, since no one follow this topic in the last three months.

I am also curious about its performance, and about how large the matrix should be to enable GPU CSRSV outperform CPU.

I read a paper from Dr. Maxim Naumov “Parallel Solution of Sparse Triangular Linear Systems in the Preconditioned Iterative Methods on the GPU”. This paper is very clear on level sorting and row assignment during the analysis phase and solving phase. However, I am not very sure the memory access pattern can attain any coalesced style, because the sparse LU factors are really irregular.

Any friend here can join this topic?

Best