# How to solve a tridiagonal matrix using the cusparse<t>gtsv2_nopivot() functions in the cusparse library

Hello，
I am a cusparse beginner and want to call the functions in the cusparse library to solve the tridiagonal matrix problem. Here is a program I wrote with reference to forum users’ code, The output of the program is not the solution of the matrix, but the value originally assigned to the B vector. How do I solve this problem?
Thank you very much!

``````PROGRAM TDMA

use iso_C_binding
use cudafor
use cusparse
implicit none

! atri - sub-diagonal (means it is the diagonal below the main diagonal)
! btri - the main diagonal
! ctri - sup-diagonal (means it is the diagonal above the main diagonal)
! dtri - right part
! npts - number of equations

integer(c_int), parameter :: npts = 31
real(c_double), dimension(npts) :: atri, btri, ctri, dtri
integer(c_int) :: cusparseCreate_status
type(cusparseHandle) :: handle
integer(c_int) :: m, n, ldb
real(c_double), dimension(npts) :: dl, d, du
real(c_double), dimension(npts, 1) :: B
integer(c_int) :: i
integer(c_size_t), target :: bufferSizeInBytes
integer(c_int) :: istat
integer(c_int), dimension(:), allocatable :: buffer

atri = 1.0
atri(1) = 0.0
btri = 2.0
ctri = 1.0
ctri(npts) = 0.0

do i = 1, 31
dtri(i) = i
end do

do i = 1, npts
B(i, 1) = dtri(i)
end do

dl  = atri
d   = btri
du  = ctri
m   = npts
n   = 1
ldb = 1

cusparseCreate_status = cusparseCreate(handle)

print *, 'CREATE cusparseCreate_status: '
if (cusparseCreate_status == CUSPARSE_STATUS_SUCCESS) then
print *, 'CUSPARSE_STATUS_SUCCESS'
elseif (cusparseCreate_status == CUSPARSE_STATUS_NOT_INITIALIZED) then
print *, 'CUSPARSE_STATUS_NOT_INITIALIZED'
elseif (cusparseCreate_status == CUSPARSE_STATUS_ALLOC_FAILED) then
print *, 'CUSPARSE_STATUS_ALLOC_FAILED'
elseif (cusparseCreate_status == CUSPARSE_STATUS_ARCH_MISMATCH) then
print *, 'CUSPARSE_STATUS_ARCH_MISMATCHED'
end if

istat = cusparseDgtsv2_nopivot_buffersizeext(handle, m, n, dl, d, du, B, ldb, bufferSizeInBytes)
allocate(buffer(bufferSizeInBytes / c_sizeof(buffer(1))))

istat = cusparseDgtsv2_nopivot(handle, m, n, dl, d, du, B, ldb, buffer)

print *, 'Dgtsv STATUS: '
if (istat == CUSPARSE_STATUS_SUCCESS) then
print *, 'CUSPARSE_STATUS_SUCCESS'
elseif (istat == CUSPARSE_STATUS_NOT_INITIALIZED) then
print *, 'CUSPARSE_STATUS_NOT_INITIALIZED'
elseif (istat == CUSPARSE_STATUS_ALLOC_FAILED) then
print *, 'CUSPARSE_STATUS_ALLOC_FAILED'
elseif (istat == CUSPARSE_STATUS_INVALID_VALUE) then
print *, 'CUSPARSE_STATUS_INVALID_VALUE'
elseif (istat == CUSPARSE_STATUS_ARCH_MISMATCH) then
print *, 'CUSPARSE_STATUS_ARCH_MISMATCHED'
elseif (istat == CUSPARSE_STATUS_EXECUTION_FAILED) then
print *, 'CUSPARSE_STATUS_EXECUTION_FAILED'
elseif (istat == CUSPARSE_STATUS_INTERNAL_ERROR) then
print *, 'CUSPARSE_STATUS_INTERNAL_ERROR'
end if

do i = 1, npts
dtri(i) = B(i, 1)
end do

! Printing solution
print *, 'The solution is: '
! Expected Solution = transpose of (0,1,0,2,0,3,...,0,8,0,7,0,6,...,0,1,0)
do i = 1, npts
print *, 'SOL(', i, '):', dtri(i)
end do
END PROGRAM TDMA

``````

Hello @lxyonline2887 and welcome to the NVIDIA developer forums!

I took the liberty of moving your post to the CUDA Programming category, I am sure you will find some support there.

Thanks!

@MarkusHoHo Why would this sub-forum be more suitable than the sub-forum dedicated to accelerated libraries including cuSPARSE?

Thank you so much!

this may be of interest

I haven’t studied your code carefully, but it seems fairly evident that you are unaware of the distinction between host and device memory when using a GPU. The cusparse function calls generally require the data to be already deposited in device memory, which you can learn by reading the documentation. Furthermore, when starting with host data, its not sufficient simply to use device allocations. There is usually some copying of data from host to device, and copying of results from device to host, in CUDA, and with typical CUDA library usage, including cusparse.

In Fortran/cudafor, such a device allocation would be evidenced by the `device` attribute on an allocation. That attribute doesn’t appear anywhere in your code and so all your data is in host memory. That won’t work, and is not how cusparse is intended to be used. You can find other examples of using CUDA library routines (in Fortran) here on these forums, that include proper use of the `device` attribute for the necessary allocations, here is one example. (Yes, I am aware that that is not an example of cusparse gtsv2.) I don’t have an example of the use of gtsv2 in CUDA fortran to point you to.

Thanks for your replies! I’ve figured out what my problem was. I will try to read those examples and correct my program.

Hi Robert!
I did some learning about array calls on host and device. I modified my program according to the example you provided. Here is my corrected procedure:

``````program TDMA
use cudafor
use cusparse
implicit none

integer,parameter :: npts = 31
integer :: m=31
integer :: n=1
integer :: ldb=31
integer :: m_d
integer :: n_d
integer :: ldb_d
real(8), dimension(:),  allocatable :: dl, d, du
real(8), allocatable, device, dimension(:) :: dl_d
real(8), allocatable, device, dimension(:) :: d_d
real(8), allocatable, device, dimension(:) :: du_d
real(8), dimension(:), allocatable :: B
real(8), allocatable,  device, dimension(:) :: B_d
integer(8) :: i
integer(8) :: istat
Integer(8) :: bufferSize
Integer(1), pointer, device :: buffer(:)

integer :: cusparseCreate_status
type(cusparseHandle) :: handle

istat = cudaMalloc(dl_d, n * c_sizeof(1))
istat = cudaMalloc(d_d, n * c_sizeof(1))
istat = cudaMalloc(du_d, n * c_sizeof(1))
istat = cudaMalloc(B_d, n * c_sizeof(1))

! Initialize your arrays and values
! Allocate host arrays
allocate(dl(npts))
allocate(d(npts))
allocate(du(npts))
allocate(B(npts))

dl = 1.0
dl(1) = 0.0
d = 2.0
du = 1.0
du(npts) = 0.0

do i = 1, 16
B(i) = i
B(32 - i) = i
end do

istat = cudaMemcpy(dl_d, dl, n * c_sizeof(1), cudaMemcpyHostToDevice)
istat = cudaMemcpy(d_d, d, n * c_sizeof(1), cudaMemcpyHostToDevice)
istat = cudaMemcpy(du_d, du, n * c_sizeof(1), cudaMemcpyHostToDevice)
istat = cudaMemcpy(B_d, B, n * c_sizeof(1), cudaMemcpyHostToDevice)

! cusparse_create and check
cusparseCreate_status = cusparseCreate(handle)

print *, 'CREATE cusparseCreate_status: '
if (cusparseCreate_status == CUSPARSE_STATUS_SUCCESS) then
print *, 'CUSPARSE_STATUS_SUCCESS'
elseif (cusparseCreate_status == CUSPARSE_STATUS_NOT_INITIALIZED) then
print *, 'CUSPARSE_STATUS_NOT_INITIALIZED'
elseif (cusparseCreate_status == CUSPARSE_STATUS_ALLOC_FAILED) then
print *, 'CUSPARSE_STATUS_ALLOC_FAILED'
elseif (cusparseCreate_status == CUSPARSE_STATUS_ARCH_MISMATCH) then
print *, 'CUSPARSE_STATUS_ARCH_MISMATCHED'
end if

! Calling Dgtsv and checking the output Status
istat = cusparseDgtsv2_nopivot_buffersizeext(handle, m_d, n_d, dl_d, d_d, du_d, B_d, ldb_d, bufferSize)
allocate(buffer(bufferSize / c_sizeof(buffer(1))))

istat = cusparseDgtsv2_nopivot(handle, m_d, n_d, dl_d, d_d, du_d, B_d, ldb_d, buffer)

write(*,*) m_d, n_d, dl_d, d_d, du_d, B_d, ldb_d
print *, 'Dgtsv STATUS: '
write(*,*) istat

if (istat == CUSPARSE_STATUS_SUCCESS) then
print *, 'CUSPARSE_STATUS_SUCCESS'

elseif (istat == CUSPARSE_STATUS_NOT_INITIALIZED) then
print *, 'CUSPARSE_STATUS_NOT_INITIALIZED'
elseif (istat == CUSPARSE_STATUS_ALLOC_FAILED) then
print *, 'CUSPARSE_STATUS_ALLOC_FAILED'
elseif (istat == CUSPARSE_STATUS_INVALID_VALUE) then
print *, 'CUSPARSE_STATUS_INVALID_VALUE'
elseif (istat == CUSPARSE_STATUS_ARCH_MISMATCH) then
print *, 'CUSPARSE_STATUS_ARCH_MISMATCHED'
elseif (istat == CUSPARSE_STATUS_EXECUTION_FAILED) then
print *, 'CUSPARSE_STATUS_EXECUTION_FAILED'
elseif (istat == CUSPARSE_STATUS_INTERNAL_ERROR) then
print *, 'CUSPARSE_STATUS_INTERNAL_ERROR'
end if

istat = cudaMemcpy(B, B_d, n * c_sizeof(1), cudaMemcpyDeviceToHost)

print *, 'The solution is: '
do i = 1, npts
print *, 'SOL(', i, '):', B(i)
end do
END PROGRAM TDMA

``````

When “device” is present in the code, the program reports the following error:
NVFORTRAN-S-0034-Syntax error at or near device (cuda_tdma.f90: 10)
NVFORTRAN-S-0034-Syntax error at or near device (cuda_tdma.f90: 11)
NVFORTRAN-S-0034-Syntax error at or near device (cuda_tdma.f90: 12)
NVFORTRAN-S-0034-Syntax error at or near device (cuda_tdma.f90: 14)
NVFORTRAN-S-0034-Syntax error at or near device (cuda_tdma.f90: 15)
NVFORTRAN-S-0034-Syntax error at or near device (cuda_tdma.f90: 16)
NVFORTRAN-S-0034-Syntax error at or near device (cuda_tdma.f90: 18)

When I delete “device”, the program error is:
NVFORTRAN-S-0155-Could not resolve generic procedure cudamalloc (cuda_tdma.f90:27)
NVFORTRAN-S-0155-Could not resolve generic procedure cudamalloc (cuda_tdma.f90: 28)
NVFORTRAN-S-0155-Could not resolve generic procedure cudamalloc (cuda_tdma.f90: 29)
NVFORTRAN-S-0155-Could not resolve generic procedure cudamalloc (cuda_tdma.f90: 30)

It’s beyond my understanding QwQ. How can I solve this problem?