CUDA Cholesky factorization gives NaN when the input matrix is positive definite matrix

I am solving a non-linear square function f by using Gauss-Newton method.
The normal equation of Gauss-Newton is

J.T * J * x = -J.T*r, where J, J.T, r, and x are Jacobian matrix of f, transposed Jacobian matrix, residual and the solution respectively.

I first calculate the Jacobian matrix and then calculate J.T * J via cusparseSpGEMM_compute.
I would call J.T * J a Hessian matrix as described in the Gauss-Newton method.
When I input this Hessian matrix into the cusolver function cusolverSpScsrlsvchol, I got the NaN result in the vector x.
But, the variable singularity is -1 which means my Hessian matrix is positive definite. Here is the cusolver doc description.
Can you tell me what are the possible reasons?

More information about my spec:
CUDA Toolkit version: 11.2
GPU: NVIDIA GeForce RTX 3070
Jacobian matrix dimension: about 25k x 12k
Hessian matrix dimension: about 12k x 12k

A pseudo code of my routine:

// Malloc device memory
cudaMallocManaged((void**)&jacCsrVal,      nnz * sizeof(float)); // length = nnz
cudaMallocManaged((void**)&jacCsrRowPtr,   (row+ 1) * sizeof(int)); // rows +1
cudaMallocManaged((void**)&jacCsrColInd,   nnz * sizeof(int));// length = nnz
cudaMallocManaged((void**)&jacCscVal_T,    nnz * sizeof(float)); // Jacobain.T: length = nnz
cudaMallocManaged((void**)&jacCscColPtr_T, (col+ 1) * sizeof(int)); // cols +1
cudaMallocManaged((void**)&jacCscRowInd_T, nnz * sizeof(int));// nnz
cudaMallocManaged((void**)&residual,       row * sizeof(float)); // vector: data_row + reg_row
cudaMallocManaged((void**)&b_val,          col * sizeof(float)); // vector: -J.T * residual 
cudaMallocManaged((void**)&x,              col * sizeof(float)); //  solution

// 1. calculate the jacobian matrix
CalcJacobian(jacCsrVal, jacCsrRowPtr, jacCsrColInd );

//2.  calculate the transposed jacobian matrix
size_t buffer_temp_size;
void* buffer_temp = NULL;
 		cusparseCsr2cscEx2_bufferSize(handle, row, col, nnz,
 		jacCsrVal, jacCsrRowPtr, jacCsrColInd,
		jacCscVal_T, jacCscColPtr_T, jacCscRowInd_T, 
 		CUDA_R_32F, CUSPARSE_ACTION_NUMERIC,
		CUSPARSE_INDEX_BASE_ZERO, CUSPARSE_CSR2CSC_ALG1, &buffer_temp_size);	cudaMallocManaged(&buffer_temp, buffer_temp_size);
cusparseCsr2cscEx2(
		handle, row, col, nnz, 
 		jacCsrVal, jacCsrRowPtr, jacCsrColInd,
		jacCscVal_T, jacCscColPtr_T, jacCscRowInd_T, 
 		CUDA_R_32F, CUSPARSE_ACTION_NUMERIC,
		CUSPARSE_INDEX_BASE_ZERO, CUSPARSE_CSR2CSC_ALG1, buffer_temp)

//3. calculate the Hessian matrix J.T*J
cusparseSpGEMM_workEstimation(handle, op, op,
		&alpha, jacobian_T, jacobian, &beta, Hessian,
		CUDA_R_32F, CUSPARSE_SPGEMM_DEFAULT,
		spgemmDesc, &bufferSize1, NULL);
cudaMallocManaged((void**)&dBuffer1, bufferSize1);
cusparseSpGEMM_workEstimation(handle, op, op,
	&alpha, jacobian_T, jacobian, &beta, Hessian,
	CUDA_R_32F, CUSPARSE_SPGEMM_DEFAULT,
	spgemmDesc, &bufferSize1, dBuffer1);
cusparseSpGEMM_compute(handle, op, op,
	&alpha, jacobian_T, jacobian, &beta, Hessian,
	CUDA_R_32F, CUSPARSE_SPGEMM_DEFAULT,
	spgemmDesc, &bufferSize2, NULL);
cudaMallocManaged((void**)&dBuffer2, bufferSize2);
cusparseSpGEMM_compute(handle, op, op,
	&alpha, jacobian_T, jacobian, &beta, Hessian,
	CUDA_R_32F, CUSPARSE_SPGEMM_DEFAULT,
	spgemmDesc, &bufferSize2, dBuffer2);

//4. Retrieve Hessian info
cusparseSpMatGetSize(Hessian, &rowHessian, &colHessian, &nnzHessian);
cudaMallocManaged(&HesCsrVal, nnzHessian * sizeof(float));
cudaMallocManaged(&HesCsrRowPtr, (col + 1) * sizeof(int));
cudaMallocManaged(&HesCsrColInd, nnzHessian * sizeof(int));
cusparseCsrSetPointers(Hessian, HesCsrRowPtr, HesCsrColInd, HesCsrVal);
cusparseSpGEMM_copy(handle, op, op,
	&alpha, jacobian_T, jacobian, &beta, Hessian,
	CUDA_R_32F, CUSPARSE_SPGEMM_DEFAULT, spgemmDesc);

//5. Cholesky factorization and solve
//     J.T*J  * x = -J.T*r 
// -> Hessian * x = b
cusolverSpScsrlsvchol(
		cusolverSpH, JtRow, nnzHessian, GeneralHessian,
		HesCsrVal, HesCsrRowPtr, HesCsrColInd, b_val,
		0, 0, x, &singularity);
// singularity is -1, but solution x is [nan, nan, nan, nan...]

I have checked all of my values of J, J.T and r are not nan.
cusolverSpScsrlsvchol stills return nan values.

I found the reasons. The variables alpha and beta are int. They should be float.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.