Question about triangular solver

Hello, CUDA developers.

I have a simple question on the use of CUBLAS and CUSPARSE.

My whole matrix consists of thousands of triangular dense block matrices.

In this case is there any better routines that I can use other than csrsv or csrsv2?

Or will the analysis phase will detect the independence between the block matrices?

It seems that there is a batched version of trsm, but my problem is basically trsv.

I can put the RHS matrix in trsm as the vector, but I presume it will not be efficient.

Thank you!