The problem for the function <cusolverDnXgetrs>

Hello, everyone
I want to solve the linear system of equations by functiion . The example proposed works well, the code follows

!===================The test Region_1===================! Solve dense linear systems
subroutine ITERATION_CDM_CUSPARSE_LU
USE CONSTANTPARAMETERS
USE ELECTROMAGNETIC_VARIABLES
USE RES_MODEL_PARAMETER
USE TIME_PARAMETERS
use openacc
use cusolverDn
use CUDAFOR
IMPLICIT NONE
!============= CUSPARSE APIs =============
type(cusolverDnHandle):: handle_1 ! A handle that handle to the cuSolver library context
type(cusolverDnParams) :: params ! A handle that handle to the cuSolver library context
INTEGER(KIND=8)::rows ! The number of rows of the dense matrix
INTEGER(KIND=8)::cols ! The number of columns of the dense matrix
INTEGER(KIND=4):: istat ! This data type represents the status returned by the library functions and it can have the values from 0 to 31
REAL(KIND=8), DIMENSION(:), device, ALLOCATABLE :: dX ! Values of the dense vector, i.e. unknown term. Array of size cols
REAL(KIND=8), DIMENSION(:), ALLOCATABLE :: dY ! Values of the dense vector, i.e. RHS. Array of size rows
REAL(KIND=8),DEVICE :: d_dense(9) ! Values of the dense coefficient vector, i.e. A
REAL(KIND=8) :: d_dense_temp(9) ! Used to Check the A
INTEGER(KIND=8):: i,j,ld,ldb
integer(8) :: workspaceInBytesOnDevice, workspaceInBytesOnHost
integer(4), device :: devinfo
INTEGER(4) ::devinfo_temp
INTEGER(KIND=1), DIMENSION(:), device, ALLOCATABLE :: bufferOnDevice
INTEGER(KIND=1), DIMENSION(:), ALLOCATABLE :: bufferOnHost
INTEGER(KIND=8), device:: ipiv(3) ! Array of size at least min(rows,cols), containing pivot indices
INTEGER(KIND=8) :: ipiv_temp(3) ! Used to Check the ipiv

!  *       | 1 2 3  |
!  *   A = | 4 5 6  |
!  *       | 7 8 10 |
!  *
!  * without pivoting: A = L*U
!  *       | 1 0 0 |      | 1  2  3 |
!  *   L = | 4 1 0 |, U = | 0 -3 -6 |
!  *       | 7 2 1 |      | 0  0  1 |
!  *
!  * with pivoting: P*A = L*U
!  *       | 0 0 1 |
!  *   P = | 1 0 0 |
!  *       | 0 1 0 |
!  *
!  *       | 1       0     0 |      | 7  8       10     |
!  *   L = | 0.1429  1     0 |, U = | 0  0.8571  1.5714 |
!  *       | 0.5714  0.5   1 |      | 0  0       -0.5   |
!  *

!============= Assigning the value of A =============
! d_dense =(/ 1.0, 4.0, 7.0, 2.0, 5.0, 8.0, 3.0, 6.0, 10.0 /)
d_dense =(/ 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 10.0 /)

!============= Host problem definition =============
rows = 3
cols = 3
ld = 3
ldb = 3
!============= Allocating and assigning the value of RHS =============
ALLOCATE(dX(cols),dY(rows))
dX	= (/1.0, 2.0, 3.0/)

!============= Device memory management =============
!$acc data create(dY,devinfo_temp,ipiv_temp,d_dense_temp)

istat = cusolverDnCreate(handle_1)
istat = cusolverDnSetStream(handle_1,acc_get_cuda_stream(44)) 
istat = cusolverDnCreateParams(params)
istat = cusolverDnSetAdvOptions(params, CUSOLVERDN_GETRF, CUSOLVER_ALG_0)

!============= Calculate the sizes needed for pre-allocated buffer =============
istat = cusolverDnXgetrf_bufferSize( handle_1, params, rows, cols, cudaDataType(CUDA_R_64F), d_dense, ld,cudaDataType(CUDA_R_64F), workspaceInBytesOnDevice, workspaceInBytesOnHost )
allocate(bufferOnDevice(workspaceInBytesOnDevice) , bufferOnHost(workspaceInBytesOnHost)  )

!============= Calculate the LU factorization of matrix A =============
istat = cusolverDnXgetrf( handle_1, params, rows, cols, cudaDataType(CUDA_R_64F), d_dense, ld,  ipiv, cudaDataType(CUDA_R_64F), bufferOnDevice,workspaceInBytesOnDevice,bufferOnHost, workspaceInBytesOnHost,devinfo )

!============= Printing the relust of LU factorization =============
write(*,*)"==================="
do j=1,9
	d_dense_temp(j) = d_dense(j)
enddo
write(*,*)d_dense_temp
write(*,*)"==================="

!============= Sloving the linear system of equations : AX=B =============

!  *       | 1 |       | -0.3333 |
!  *   B = | 2 |,  X = |  0.6667 |
!  *       | 3 |       |  0      |

istat = cusolverDnXgetrs( handle_1, params,CUBLAS_OP_T, rows, 1, cudaDataType(CUDA_R_64F), d_dense, ld,  ipiv, cudaDataType(CUDA_R_64F), dX, ldb, devinfo )
devinfo_temp = devinfo

!$acc kernels async(44)
!$acc loop collapse(1) 
	do j=1,3
		dY(j) = dX(j)
	enddo
!$acc end kernels
!$acc update host(dY)

!$acc end data

!============= Printing the relust of X =============
do i=1, 3
	write(*,*)dY(i)
enddo

end subroutine ITERATION_CDM_CUSPARSE_LU

! =========================The test Region_1=========================!

When I put these into my program, a strage thing happend. If I solve a 1872*1872 linear system where is a tri-diagonal linear system, I can obtain correct results. If the matrix has several more entries than tri-diagonal matrix, I obtain wrong results. I ensure that the coefficient matrix is assembled prefectly.

I don not know what caused its. I can offer the source code if needed
Thanks in advanced.

To clarify, are you meaning when you use sizes larger than 1872 you’re getting incorrect results?

I can offer the source code if needed

Please. Having a minimal example so we can reproduce the error would be appreciated.

The size of linear systems is not changed, instead the number of non-zero entries will be changed. For two linear systems with a same size, the one with a tri-diagonal matrix can get correct results, however, the other one with a general matrix, which has more several non-zero entries than the tri-diagonal matrix, will get incorrect results

Sorry, I realize that the part solving the linear system of equations is related to other subroutine. So, I try to offer a contruction that illustrates the evolution of coefficient matrix. Although the code doesn’t work, maybe them show how I intend to use it.

cuSolver_Dense_LU.zip (2.3 KB)