Solve upper blocked triangular system with cusparseDbsrsv2_solve of cusparse

I’ve try to solve a blocked upper triangular linear system with cusparseDbsrsv2_solve with no success, and I can not figure out what does this function do. I’ve set up all the internal structures and set the accelerator memory region with the sparse matrix and the obtained result does not match with the correct answer. Is there any particular trick with this function to solve properly a blocked upper triangular matrix?.

My matrix is a sparse blocked (3x3) upper triangular type with the follow distribution.

0 Q R
0 0 P

Where the letters are 3x3 dense blocks and 0 are the empty blocks. So I do pass the follow connectivity matrix to the cusparseDbsrsv2_solve function. ia={0,3,5,6}, ja={0,1,2,1,2,2}, with the values of the blocks using row wise disposition.

All the needed routines are called previous to the call to cusparseDbsrsv2_solve. To sum up I list here:

cusparseSolvePolicy_t policy_U = CUSPARSE_SOLVE_POLICY_NO_LEVEL
cusparseOperation_t trans_U  = CUSPARSE_OPERATION_NON_TRANSPOSE
cusparseDirection_t dir = CUSPARSE_DIRECTION_ROW

//The Matrix description is:
cusparseSetMatIndexBase(descr_U, CUSPARSE_INDEX_BASE_ZERO)
cusparseSetMatFillMode(descr_U, CUSPARSE_FILL_MODE_UPPER)
cusparseSetMatDiagType(descr_U, CUSPARSE_DIAG_TYPE_NON_UNIT)

//And the call to cusparseDbsrsv2_solve is made with the follows parameters.
//number of block rows and block columns 
mb = 3
blockDim = 3
//number of non zero blocks
nnzb = 6

When getting the output result I do expect that the three last entries of my solution vector to be:

y = P^-1*x

Where y is a 3x1 column vector of the last 3 entries of the total result, P^-1 is the matrix inverse of the dense block P and x are the three last entries of the rhs vector. So, suppose x is a vector with all entries equal to 1.0, I can easily get y by inverting P and computing the product P^-1*x.

The problem, is that the cuSparse result does not match my expected result and instead it seams to me (make the math) that cuSparse obtain the answer to the last three entries by doing a backward substitution with the entries of the block P. i.e it seams cuSparse solve the system

P y = x.

By assuming that P is upper triangular block and therefore solving y with a backward substitution of P. Leading to completely wrong results, I’m missing a fundamental key to properly use this function?.