Cusolver sparse least-squares produces NaN result on rank-def input

cudesnick · December 1, 2024, 6:42pm

Hey,

I’m using cusolverSpScsrlsqvqrHost routine from cuda 11.1 to solve an over-determined sparse linear problem. My code passes my tests in most cases, but if the problem matrix is rank-deficient, then that routine returns a bunch of NaNs in the result vector. What’s interesting is that the routine correctly determines the rank of the matrix, produces reasonable permutation vector (i.e. permutes the variable that has only zeros in the matrix to the last position) and even reports the correct norm of the least-squares solution (as checked by octave).

Let me reiterate: my tests pass in the “normal” (not rank-deficient) cases, so I have at least some level of confidence in my code.

Here’s the reproduction case:
Matrix dimension 20x7 (tall)
Non-zero elements, csrValA parameter (floats, initialized by integers, for simplicity, 20 total): 2 1001 2005 2007 3002 4001 4005 5002 5007 7002 9002 10007 11006 12002 12005 13005 14005 15002 16002 18004
column indices of non-zero elements, csrColIndA parameter: 1 0 4 6 1 0 4 1 6 1 1 6 5 1 4 4 4 1 1 3
row offsets, terminated by the total number of non-zeros, csrRowPtrA parameter: 0 1 2 4 5 7 9 9 10 10 11 12 13 15 16 17 18 19 19 20 20
right-hand-side: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Note: some equations define zeros-only rows in the matrix, but the right-hand-sides for those equations are not zero, which, as far as I understand, is a valid use case.

Octave code determines the least-squares solution x=M\rhs as: 5.2606e-04 9.0647e-04 4.1200e-18 1.0553e-03 8.1494e-04 1.0903e-03 9.2974e-04. Note: the 3rd element of this solution is effectively zero. It corresponds to a zeros-only column in the matrix and also drives the index of that column (#2) to the last position of the permutation, as returned by the routine.

Both octave code and cusolver routine return the same (up to the single-precision accuracy of the routine) least-squares min norm of 31.200939. I use the following octave code to produce the norm: norm(M*x-rhs), where M is the matrix, x is the octave solution and rhs is the right-hand-side.

I would appreciate it if someone could please reproduce that use case, ideally on cuda-11.1 and let me know if that’s my bug or something else.

Thank you!

Full matrix, easy to paste into octave/matlab:

0	2	0	0	0	0	0
1001	0	0	0	0	0	0
0	0	0	0	2005	0	2007
0	3002	0	0	0	0	0
4001	0	0	0	4005	0	0
0	5002	0	0	0	0	5007
0	0	0	0	0	0	0
0	7002	0	0	0	0	0
0	0	0	0	0	0	0
0	9002	0	0	0	0	0
0	0	0	0	0	0	10007
0	0	0	0	0	11006	0
0	12002	0	0	12005	0	0
0	0	0	0	13005	0	0
0	0	0	0	14005	0	0
0	15002	0	0	0	0	0
0	16002	0	0	0	0	0
0	0	0	0	0	0	0
0	0	0	18004	0	0	0
0	0	0	0	0	0	0

Topic		Replies	Views
Cusolver solve sparse Ax=b wrong GPU-Accelerated Libraries cusolver	7	308	August 19, 2024
cuSOLVER sparse: cusolverSpDcsrlsvqr() error GPU-Accelerated Libraries	4	3915	June 7, 2015
Problem with "cusparseDcsrsv" GPU-Accelerated Libraries	5	2182	May 23, 2013
In the following csrqr code, all values come out as Nan GPU-Accelerated Libraries cuda	0	440	February 14, 2023
cuSOLVER solving a overdetermined linear equation system GPU-Accelerated Libraries cusolver	7	1420	July 21, 2021
GPU accelerated least-squares solutions with cusolverSpScsrlsqvqr GPU-Accelerated Libraries	4	2552	June 8, 2019
Performance cusparseScsrsm_solve GPU-Accelerated Libraries	0	788	December 1, 2014
cuSolver work improperly with cusolverDnSsyevd. GPU-Accelerated Libraries	4	871	June 2, 2019
Cuda Least Squares GPU-Accelerated Libraries cusolver	0	596	September 27, 2023
Adding cusolverSp CUDA NVCC Compiler	0	512	January 4, 2023

Cusolver sparse least-squares produces NaN result on rank-def input

Related topics