# CUDA Cholesky factorization gives NaN when the input matrix is positive definite matrix

I am solving a non-linear square function f by using Gauss-Newton method.
The normal equation of Gauss-Newton is

J.T * J * x = -J.T*r, where J, J.T, r, and x are Jacobian matrix of f, transposed Jacobian matrix, residual and the solution respectively.

I first calculate the Jacobian matrix and then calculate J.T * J via `cusparseSpGEMM_compute`.
I would call J.T * J a Hessian matrix as described in the Gauss-Newton method.
When I input this Hessian matrix into the cusolver function `cusolverSpScsrlsvchol`, I got the NaN result in the vector x.
But, the variable `singularity` is -1 which means my Hessian matrix is positive definite. Here is the cusolver doc description.
Can you tell me what are the possible reasons?

CUDA Toolkit version: 11.2
GPU: NVIDIA GeForce RTX 3070
Jacobian matrix dimension: about 25k x 12k
Hessian matrix dimension: about 12k x 12k

A pseudo code of my routine:

``````// Malloc device memory
cudaMallocManaged((void**)&jacCsrVal,      nnz * sizeof(float)); // length = nnz
cudaMallocManaged((void**)&jacCsrRowPtr,   (row+ 1) * sizeof(int)); // rows +1
cudaMallocManaged((void**)&jacCsrColInd,   nnz * sizeof(int));// length = nnz
cudaMallocManaged((void**)&jacCscVal_T,    nnz * sizeof(float)); // Jacobain.T: length = nnz
cudaMallocManaged((void**)&jacCscColPtr_T, (col+ 1) * sizeof(int)); // cols +1
cudaMallocManaged((void**)&jacCscRowInd_T, nnz * sizeof(int));// nnz
cudaMallocManaged((void**)&residual,       row * sizeof(float)); // vector: data_row + reg_row
cudaMallocManaged((void**)&b_val,          col * sizeof(float)); // vector: -J.T * residual
cudaMallocManaged((void**)&x,              col * sizeof(float)); //  solution

// 1. calculate the jacobian matrix
CalcJacobian(jacCsrVal, jacCsrRowPtr, jacCsrColInd );

//2.  calculate the transposed jacobian matrix
size_t buffer_temp_size;
void* buffer_temp = NULL;
cusparseCsr2cscEx2_bufferSize(handle, row, col, nnz,
jacCsrVal, jacCsrRowPtr, jacCsrColInd,
jacCscVal_T, jacCscColPtr_T, jacCscRowInd_T,
CUDA_R_32F, CUSPARSE_ACTION_NUMERIC,
CUSPARSE_INDEX_BASE_ZERO, CUSPARSE_CSR2CSC_ALG1, &buffer_temp_size);	cudaMallocManaged(&buffer_temp, buffer_temp_size);
cusparseCsr2cscEx2(
handle, row, col, nnz,
jacCsrVal, jacCsrRowPtr, jacCsrColInd,
jacCscVal_T, jacCscColPtr_T, jacCscRowInd_T,
CUDA_R_32F, CUSPARSE_ACTION_NUMERIC,
CUSPARSE_INDEX_BASE_ZERO, CUSPARSE_CSR2CSC_ALG1, buffer_temp)

//3. calculate the Hessian matrix J.T*J
cusparseSpGEMM_workEstimation(handle, op, op,
&alpha, jacobian_T, jacobian, &beta, Hessian,
CUDA_R_32F, CUSPARSE_SPGEMM_DEFAULT,
spgemmDesc, &bufferSize1, NULL);
cudaMallocManaged((void**)&dBuffer1, bufferSize1);
cusparseSpGEMM_workEstimation(handle, op, op,
&alpha, jacobian_T, jacobian, &beta, Hessian,
CUDA_R_32F, CUSPARSE_SPGEMM_DEFAULT,
spgemmDesc, &bufferSize1, dBuffer1);
cusparseSpGEMM_compute(handle, op, op,
&alpha, jacobian_T, jacobian, &beta, Hessian,
CUDA_R_32F, CUSPARSE_SPGEMM_DEFAULT,
spgemmDesc, &bufferSize2, NULL);
cudaMallocManaged((void**)&dBuffer2, bufferSize2);
cusparseSpGEMM_compute(handle, op, op,
&alpha, jacobian_T, jacobian, &beta, Hessian,
CUDA_R_32F, CUSPARSE_SPGEMM_DEFAULT,
spgemmDesc, &bufferSize2, dBuffer2);

//4. Retrieve Hessian info
cusparseSpMatGetSize(Hessian, &rowHessian, &colHessian, &nnzHessian);
cudaMallocManaged(&HesCsrVal, nnzHessian * sizeof(float));
cudaMallocManaged(&HesCsrRowPtr, (col + 1) * sizeof(int));
cudaMallocManaged(&HesCsrColInd, nnzHessian * sizeof(int));
cusparseCsrSetPointers(Hessian, HesCsrRowPtr, HesCsrColInd, HesCsrVal);
cusparseSpGEMM_copy(handle, op, op,
&alpha, jacobian_T, jacobian, &beta, Hessian,
CUDA_R_32F, CUSPARSE_SPGEMM_DEFAULT, spgemmDesc);

//5. Cholesky factorization and solve
//     J.T*J  * x = -J.T*r
// -> Hessian * x = b
cusolverSpScsrlsvchol(
cusolverSpH, JtRow, nnzHessian, GeneralHessian,
HesCsrVal, HesCsrRowPtr, HesCsrColInd, b_val,
0, 0, x, &singularity);
// singularity is -1, but solution x is [nan, nan, nan, nan...]
``````

I have checked all of my values of J, J.T and r are not nan.
`cusolverSpScsrlsvchol` stills return nan values.

I found the reasons. The variables `alpha` and `beta` are `int`. They should be `float`.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.