How to solve a tridiagonal matrix using the cusparse<t>gtsv2_nopivot() functions in the cusparse library

lxyonline2887 · November 3, 2023, 6:56am

Hello，
I am a cusparse beginner and want to call the functions in the cusparse library to solve the tridiagonal matrix problem. Here is a program I wrote with reference to forum users’ code, The output of the program is not the solution of the matrix, but the value originally assigned to the B vector. How do I solve this problem?
Thank you very much!

PROGRAM TDMA
	
  use iso_C_binding
  use cudafor
  use cusparse
  implicit none

  ! atri - sub-diagonal (means it is the diagonal below the main diagonal)
  ! btri - the main diagonal
  ! ctri - sup-diagonal (means it is the diagonal above the main diagonal)
  ! dtri - right part
  ! npts - number of equations
 
  integer(c_int), parameter :: npts = 31
  real(c_double), dimension(npts) :: atri, btri, ctri, dtri
  integer(c_int) :: cusparseCreate_status
  type(cusparseHandle) :: handle
  integer(c_int) :: m, n, ldb
  real(c_double), dimension(npts) :: dl, d, du
  real(c_double), dimension(npts, 1) :: B
  integer(c_int) :: i
  integer(c_size_t), target :: bufferSizeInBytes
  integer(c_int) :: istat
  integer(c_int), dimension(:), allocatable :: buffer

  atri = 1.0
  atri(1) = 0.0
  btri = 2.0
  ctri = 1.0
  ctri(npts) = 0.0

  do i = 1, 31
    dtri(i) = i 
  end do

  do i = 1, npts
    B(i, 1) = dtri(i)
  end do

  dl  = atri
  d   = btri
  du  = ctri
  m   = npts
  n   = 1
  ldb = 1

  cusparseCreate_status = cusparseCreate(handle)

  print *, 'CREATE cusparseCreate_status: '
  if (cusparseCreate_status == CUSPARSE_STATUS_SUCCESS) then
    print *, 'CUSPARSE_STATUS_SUCCESS'
  elseif (cusparseCreate_status == CUSPARSE_STATUS_NOT_INITIALIZED) then
    print *, 'CUSPARSE_STATUS_NOT_INITIALIZED'
  elseif (cusparseCreate_status == CUSPARSE_STATUS_ALLOC_FAILED) then
    print *, 'CUSPARSE_STATUS_ALLOC_FAILED'
  elseif (cusparseCreate_status == CUSPARSE_STATUS_ARCH_MISMATCH) then
    print *, 'CUSPARSE_STATUS_ARCH_MISMATCHED'
  end if

  istat = cusparseDgtsv2_nopivot_buffersizeext(handle, m, n, dl, d, du, B, ldb, bufferSizeInBytes)
  allocate(buffer(bufferSizeInBytes / c_sizeof(buffer(1))))

  istat = cusparseDgtsv2_nopivot(handle, m, n, dl, d, du, B, ldb, buffer)

  print *, 'Dgtsv STATUS: '
  if (istat == CUSPARSE_STATUS_SUCCESS) then
    print *, 'CUSPARSE_STATUS_SUCCESS'
  elseif (istat == CUSPARSE_STATUS_NOT_INITIALIZED) then
    print *, 'CUSPARSE_STATUS_NOT_INITIALIZED'
  elseif (istat == CUSPARSE_STATUS_ALLOC_FAILED) then
    print *, 'CUSPARSE_STATUS_ALLOC_FAILED'
  elseif (istat == CUSPARSE_STATUS_INVALID_VALUE) then
    print *, 'CUSPARSE_STATUS_INVALID_VALUE'
  elseif (istat == CUSPARSE_STATUS_ARCH_MISMATCH) then
    print *, 'CUSPARSE_STATUS_ARCH_MISMATCHED'
  elseif (istat == CUSPARSE_STATUS_EXECUTION_FAILED) then
    print *, 'CUSPARSE_STATUS_EXECUTION_FAILED'
  elseif (istat == CUSPARSE_STATUS_INTERNAL_ERROR) then
    print *, 'CUSPARSE_STATUS_INTERNAL_ERROR'
  end if

  do i = 1, npts
    dtri(i) = B(i, 1)
  end do

  ! Printing solution
  print *, 'The solution is: '
  ! Expected Solution = transpose of (0,1,0,2,0,3,...,0,8,0,7,0,6,...,0,1,0)
  do i = 1, npts
    print *, 'SOL(', i, '):', dtri(i)
  end do
END PROGRAM TDMA

MarkusHoHo · November 3, 2023, 7:40am

Hello @lxyonline2887 and welcome to the NVIDIA developer forums!

I took the liberty of moving your post to the CUDA Programming category, I am sure you will find some support there.

Thanks!

njuffa · November 3, 2023, 8:23am

@MarkusHoHo Why would this sub-forum be more suitable than the sub-forum dedicated to accelerated libraries including cuSPARSE?

lxyonline2887 · November 3, 2023, 1:06pm

Thank you so much!

Robert_Crovella · November 3, 2023, 2:06pm

this may be of interest

Robert_Crovella · November 3, 2023, 3:17pm

I haven’t studied your code carefully, but it seems fairly evident that you are unaware of the distinction between host and device memory when using a GPU. The cusparse function calls generally require the data to be already deposited in device memory, which you can learn by reading the documentation. Furthermore, when starting with host data, its not sufficient simply to use device allocations. There is usually some copying of data from host to device, and copying of results from device to host, in CUDA, and with typical CUDA library usage, including cusparse.

In Fortran/cudafor, such a device allocation would be evidenced by the device attribute on an allocation. That attribute doesn’t appear anywhere in your code and so all your data is in host memory. That won’t work, and is not how cusparse is intended to be used. You can find other examples of using CUDA library routines (in Fortran) here on these forums, that include proper use of the device attribute for the necessary allocations, here is one example. (Yes, I am aware that that is not an example of cusparse gtsv2.) I don’t have an example of the use of gtsv2 in CUDA fortran to point you to.

lxyonline2887 · November 4, 2023, 3:17am

Thanks for your replies! I’ve figured out what my problem was. I will try to read those examples and correct my program.

lxyonline2887 · November 5, 2023, 2:33pm

Hi Robert!
I did some learning about array calls on host and device. I modified my program according to the example you provided. Here is my corrected procedure:

program TDMA
  use cudafor
  use cusparse
  implicit none

  integer,parameter :: npts = 31
  integer :: m=31
  integer :: n=1
  integer :: ldb=31
  integer :: m_d
  integer :: n_d
  integer :: ldb_d
  real(8), dimension(:),  allocatable :: dl, d, du
  real(8), allocatable, device, dimension(:) :: dl_d
  real(8), allocatable, device, dimension(:) :: d_d
  real(8), allocatable, device, dimension(:) :: du_d
  real(8), dimension(:), allocatable :: B
  real(8), allocatable,  device, dimension(:) :: B_d
  integer(8) :: i
  integer(8) :: istat
  Integer(8) :: bufferSize
  Integer(1), pointer, device :: buffer(:)

  integer :: cusparseCreate_status
  type(cusparseHandle) :: handle

  istat = cudaMalloc(dl_d, n * c_sizeof(1))
  istat = cudaMalloc(d_d, n * c_sizeof(1))
  istat = cudaMalloc(du_d, n * c_sizeof(1))
  istat = cudaMalloc(B_d, n * c_sizeof(1))
  
  ! Initialize your arrays and values
  ! Allocate host arrays
  allocate(dl(npts))
  allocate(d(npts))
  allocate(du(npts))
  allocate(B(npts))
  
  dl = 1.0
  dl(1) = 0.0
  d = 2.0
  du = 1.0
  du(npts) = 0.0

  do i = 1, 16
    B(i) = i
    B(32 - i) = i
  end do
  
  istat = cudaMemcpy(dl_d, dl, n * c_sizeof(1), cudaMemcpyHostToDevice)
  istat = cudaMemcpy(d_d, d, n * c_sizeof(1), cudaMemcpyHostToDevice)
  istat = cudaMemcpy(du_d, du, n * c_sizeof(1), cudaMemcpyHostToDevice)
  istat = cudaMemcpy(B_d, B, n * c_sizeof(1), cudaMemcpyHostToDevice)
  
  ! cusparse_create and check
  cusparseCreate_status = cusparseCreate(handle)

  print *, 'CREATE cusparseCreate_status: '
  if (cusparseCreate_status == CUSPARSE_STATUS_SUCCESS) then
    print *, 'CUSPARSE_STATUS_SUCCESS'
  elseif (cusparseCreate_status == CUSPARSE_STATUS_NOT_INITIALIZED) then
    print *, 'CUSPARSE_STATUS_NOT_INITIALIZED'
  elseif (cusparseCreate_status == CUSPARSE_STATUS_ALLOC_FAILED) then
    print *, 'CUSPARSE_STATUS_ALLOC_FAILED'
  elseif (cusparseCreate_status == CUSPARSE_STATUS_ARCH_MISMATCH) then
    print *, 'CUSPARSE_STATUS_ARCH_MISMATCHED'
  end if

  ! Calling Dgtsv and checking the output Status
  istat = cusparseDgtsv2_nopivot_buffersizeext(handle, m_d, n_d, dl_d, d_d, du_d, B_d, ldb_d, bufferSize)
  allocate(buffer(bufferSize / c_sizeof(buffer(1))))

  istat = cusparseDgtsv2_nopivot(handle, m_d, n_d, dl_d, d_d, du_d, B_d, ldb_d, buffer)
  
  write(*,*) m_d, n_d, dl_d, d_d, du_d, B_d, ldb_d
  print *, 'Dgtsv STATUS: '
  write(*,*) istat
  
  if (istat == CUSPARSE_STATUS_SUCCESS) then
    print *, 'CUSPARSE_STATUS_SUCCESS'
    
  elseif (istat == CUSPARSE_STATUS_NOT_INITIALIZED) then
    print *, 'CUSPARSE_STATUS_NOT_INITIALIZED'
  elseif (istat == CUSPARSE_STATUS_ALLOC_FAILED) then
    print *, 'CUSPARSE_STATUS_ALLOC_FAILED'
  elseif (istat == CUSPARSE_STATUS_INVALID_VALUE) then
    print *, 'CUSPARSE_STATUS_INVALID_VALUE'
  elseif (istat == CUSPARSE_STATUS_ARCH_MISMATCH) then
    print *, 'CUSPARSE_STATUS_ARCH_MISMATCHED'
  elseif (istat == CUSPARSE_STATUS_EXECUTION_FAILED) then
    print *, 'CUSPARSE_STATUS_EXECUTION_FAILED'
  elseif (istat == CUSPARSE_STATUS_INTERNAL_ERROR) then
    print *, 'CUSPARSE_STATUS_INTERNAL_ERROR'
  end if

  istat = cudaMemcpy(B, B_d, n * c_sizeof(1), cudaMemcpyDeviceToHost)

  print *, 'The solution is: '
  do i = 1, npts
    print *, 'SOL(', i, '):', B(i)
  end do
END PROGRAM TDMA

When “device” is present in the code, the program reports the following error:
NVFORTRAN-S-0034-Syntax error at or near device (cuda_tdma.f90: 10)
NVFORTRAN-S-0034-Syntax error at or near device (cuda_tdma.f90: 11)
NVFORTRAN-S-0034-Syntax error at or near device (cuda_tdma.f90: 12)
NVFORTRAN-S-0034-Syntax error at or near device (cuda_tdma.f90: 14)
NVFORTRAN-S-0034-Syntax error at or near device (cuda_tdma.f90: 15)
NVFORTRAN-S-0034-Syntax error at or near device (cuda_tdma.f90: 16)
NVFORTRAN-S-0034-Syntax error at or near device (cuda_tdma.f90: 18)

When I delete “device”, the program error is:
NVFORTRAN-S-0155-Could not resolve generic procedure cudamalloc (cuda_tdma.f90:27)
NVFORTRAN-S-0155-Could not resolve generic procedure cudamalloc (cuda_tdma.f90: 28)
NVFORTRAN-S-0155-Could not resolve generic procedure cudamalloc (cuda_tdma.f90: 29)
NVFORTRAN-S-0155-Could not resolve generic procedure cudamalloc (cuda_tdma.f90: 30)

It’s beyond my understanding QwQ. How can I solve this problem?
Looking forward to your reply. Thank you so much!

Topic		Replies	Views
Using cusparseDgtsv2_nopivot() with OpenACC in Fortran code nvc, nvc++ and nvfortran cuda	6	422	November 13, 2023
Using cusparseDgtsv2_nopivot() with OpenACC in Fortran code GPU-Accelerated Libraries cuda , cusparse	2	462	November 9, 2023
Unable to compile and use the cuSPARSE Dgtsv function in CUDA FORTRAN Legacy PGI Compilers	6	2167	July 6, 2019
cusparse gtsv2 reg. Legacy PGI Compilers	4	3198	November 20, 2019
cuSPARSE generic procedure could not be resolved NVFORTRAN-S-0155 nvc, nvc++ and nvfortran cuda	9	815	November 22, 2021
A runtime error during use of cusparseDgtsv2 function in CUDA Fortran nvc, nvc++ and nvfortran cuda	8	429	October 19, 2023
Using cuSolverDN in FORTRAN code GPU-Accelerated Libraries	8	4269	November 18, 2015
cusparseScsrmv transpose mode is not working CUDA Programming and Performance	17	1506	July 9, 2018
Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaMemsetAsync CUDA Programming and Performance	7	7659	January 11, 2020
Problem with cusparseDcsrsv_solve() function GPU-Accelerated Libraries	2	1224	July 30, 2015

How to solve a tridiagonal matrix using the cusparse<t>gtsv2_nopivot() functions in the cusparse library

Related topics