Hey,
I’m using cusolverSpScsrlsqvqrHost routine from cuda 11.1 to solve an over-determined sparse linear problem. My code passes my tests in most cases, but if the problem matrix is rank-deficient, then that routine returns a bunch of NaNs in the result vector. What’s interesting is that the routine correctly determines the rank of the matrix, produces reasonable permutation vector (i.e. permutes the variable that has only zeros in the matrix to the last position) and even reports the correct norm of the least-squares solution (as checked by octave).
Let me reiterate: my tests pass in the “normal” (not rank-deficient) cases, so I have at least some level of confidence in my code.
Here’s the reproduction case:
Matrix dimension 20x7 (tall)
Non-zero elements, csrValA parameter (floats, initialized by integers, for simplicity, 20 total): 2 1001 2005 2007 3002 4001 4005 5002 5007 7002 9002 10007 11006 12002 12005 13005 14005 15002 16002 18004
column indices of non-zero elements, csrColIndA parameter: 1 0 4 6 1 0 4 1 6 1 1 6 5 1 4 4 4 1 1 3
row offsets, terminated by the total number of non-zeros, csrRowPtrA parameter: 0 1 2 4 5 7 9 9 10 10 11 12 13 15 16 17 18 19 19 20 20
right-hand-side: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Note: some equations define zeros-only rows in the matrix, but the right-hand-sides for those equations are not zero, which, as far as I understand, is a valid use case.
Octave code determines the least-squares solution x=M\rhs as: 5.2606e-04 9.0647e-04 4.1200e-18 1.0553e-03 8.1494e-04 1.0903e-03 9.2974e-04. Note: the 3rd element of this solution is effectively zero. It corresponds to a zeros-only column in the matrix and also drives the index of that column (#2) to the last position of the permutation, as returned by the routine.
Both octave code and cusolver routine return the same (up to the single-precision accuracy of the routine) least-squares min norm of 31.200939. I use the following octave code to produce the norm: norm(M*x-rhs), where M is the matrix, x is the octave solution and rhs is the right-hand-side.
I would appreciate it if someone could please reproduce that use case, ideally on cuda-11.1 and let me know if that’s my bug or something else.
Thank you!
Full matrix, easy to paste into octave/matlab:
0 2 0 0 0 0 0
1001 0 0 0 0 0 0
0 0 0 0 2005 0 2007
0 3002 0 0 0 0 0
4001 0 0 0 4005 0 0
0 5002 0 0 0 0 5007
0 0 0 0 0 0 0
0 7002 0 0 0 0 0
0 0 0 0 0 0 0
0 9002 0 0 0 0 0
0 0 0 0 0 0 10007
0 0 0 0 0 11006 0
0 12002 0 0 12005 0 0
0 0 0 0 13005 0 0
0 0 0 0 14005 0 0
0 15002 0 0 0 0 0
0 16002 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 18004 0 0 0
0 0 0 0 0 0 0