Help with cuSparse: why is this code not working?

I am trying to use cusparseDgtsv to solve a tridiagonal system of equations. I wrote a test code using this function and everything worked well. However, now I am trying to use this function in a larger code. Below is the subroutine that uses the function:

! subroutine cutridiag
! Solves tridiagonal matrix using cuSPARSE library
subroutine cutridiag(l, d, cons, r, x, n, n_eqn)
! l is lower diagonal
! d is diagonal
! cons is upper diagonal
! r is right hand side
! x is temporary variable used to send back output array to a different
! subroutine
! n is the number of elements in the arrays
! n_eqn is the number of equations. this is a 2-D Euler equation CFD code.
! so all 4 arrays are sent at once.
type(cusparseHandle) :: handle
integer :: cusparseCreate_status
integer :: istat
integer :: i, j, m
integer, intent(in) :: n
integer, intent(in) :: n_eqn
real(dp), dimension(n), intent(in) :: l, d, cons
real(dp), dimension(n,n_eqn), intent(inout) :: r
real(dp), dimension(n,n_eqn), intent(inout) :: x
real(dp), device, dimension(n) :: l_d, d_d, cons_d
real(dp), device, dimension(n,1) :: r_d, x_d
real(dp), dimension(n) :: temp

cusparseCreate_status = cusparseCreate(handle)

l_d = l
d_d = d
cons_d = cons
temp(:) = r(:,1)

do m = 1,n_eqn
   r_d(:,1) = r(:,m)
   istat = cusparseDgtsv(handle, n, 1, l_d, d_d, cons_d, r_d, n)
   x(:,m) = r_d(:,1)
end do

end subroutine cutridiag

The input that the subroutine receives looks like this:
do k = 1,n_eqn
r(0,k) = (1.d0/18.d0)*un(-1,k)+(19.d0/18.d0)*un(0,k)+(10.d0/18.d0)*un(1,k)-(1.d0/2.d0)*ulnew_weno(-1,k)
r(NS,k) = (1.d0/18.d0)*un(NS-1,k)+(19.d0/18.d0)*un(NS,k)+(10.d0/18.d0)*un(NS+1,k)-(1.d0/6.d0)*ulnew_weno(NS+1,k)

   do i=1,NS-1
      r(i,k) = (1.d0/18.d0)*un(i-1,k)+(19.d0/18.d0)*un(i,k)+(10.d0/18.d0)*un(i+1,k)
   end do
end do

do i = 0, NS
   a (i) = 1.d0/2.d0
   b (i) = 1.d0
   c1(i) = 1.d0/6.d0
end do
a (0)  = 0.d0
c1(NS) = 0.d0
total  = NS+1

call cutridiag(a, b, c1, r, temp, total, n_eqn)

I have tested this code with a normal serial TDMA and it works fine. I wanted speedup from a GPU so I thought to use a function from the cuSparse library. If whoever reads this would be so kind, please let me know if there is anything outwardly wrong with my code. I really appreciate it! Any other information that may be needed, just let me know.