Using cusparseDgtsv2_nopivot() with OpenACC in Fortran code

lxyonline2887 · November 9, 2023, 1:05pm

Hello,
I have written a test code that calls cusparseDgtsv2_nopivot function in cusparse library to solve tridiagonal matrix, in which openacc is used for data transmission, mainly referring to tcusparse3.f90.
Here is my program

PROGRAM TDMA
  use openacc
  use cusparse
  implicit none

  integer, parameter :: npts = 31
  integer :: cusparseCreate_status
  type(cusparseHandle) :: handle
  integer :: m, n, ldb
  real(8) :: dl(npts), d(npts), du(npts)
  real(8) :: B(npts)
  integer :: i
  integer :: istat
  integer(8) :: bufferSizeInBytes
  integer(1), pointer:: buffer(:)
  
  cusparseCreate_status = cusparseCreate(handle)
  !$acc data create(dl,d,du,B)
  m = npts
  n = 1
  ldb = npts
  dl = 1.0
  dl(1) = 0.0
  d = 2.0
  du = 1.0
  du(npts) = 0.0
  do i = 1, 16
    B(i) = i
    B(32 - i) = i
  end do
  !%acc update device(dl,d,du,B)
    
  print *, 'CREATE cusparseCreate_status: '
  if (cusparseCreate_status == CUSPARSE_STATUS_SUCCESS) then
    print *, 'CUSPARSE_STATUS_SUCCESS'
  elseif (cusparseCreate_status == CUSPARSE_STATUS_NOT_INITIALIZED) then
    print *, 'CUSPARSE_STATUS_NOT_INITIALIZED'
  elseif (cusparseCreate_status == CUSPARSE_STATUS_ALLOC_FAILED) then
    print *, 'CUSPARSE_STATUS_ALLOC_FAILED'
  elseif (cusparseCreate_status == CUSPARSE_STATUS_ARCH_MISMATCH) then
    print *, 'CUSPARSE_STATUS_ARCH_MISMATCHED'
  end if

  istat = cusparseDgtsv2_nopivot_buffersizeext(handle, m, n, dl, d, du, B, ldb, bufferSizeInBytes)
  allocate(buffer(bufferSizeInBytes))
  istat = cusparseDgtsv2_nopivot(handle, m, n, dl, d, du, B, ldb, buffer)
    print *, 'Dgtsv STATUS: '
  if (istat == CUSPARSE_STATUS_SUCCESS) then
    print *, 'CUSPARSE_STATUS_SUCCESS'
  elseif (istat == CUSPARSE_STATUS_NOT_INITIALIZED) then
    print *, 'CUSPARSE_STATUS_NOT_INITIALIZED'
  elseif (istat == CUSPARSE_STATUS_ALLOC_FAILED) then
    print *, 'CUSPARSE_STATUS_ALLOC_FAILED'
  elseif (istat == CUSPARSE_STATUS_INVALID_VALUE) then
    print *, 'CUSPARSE_STATUS_INVALID_VALUE'
  elseif (istat == CUSPARSE_STATUS_ARCH_MISMATCH) then
    print *, 'CUSPARSE_STATUS_ARCH_MISMATCHED'
  elseif (istat == CUSPARSE_STATUS_EXECUTION_FAILED) then
    print *, 'CUSPARSE_STATUS_EXECUTION_FAILED'
  elseif (istat == CUSPARSE_STATUS_INTERNAL_ERROR) then
    print *, 'CUSPARSE_STATUS_INTERNAL_ERROR'
  end if
  
  !$acc update host(dl,d,du,B)
  !$acc end data
  
  print *, 'The solution is: '
  do i = 1, npts
    print *, 'SOL(', i, '):', B(i)
  end do
END PROGRAM TDMA

After running, it displays:

nvfortran -Mpreprocess -fast -acc=gpu -cudalib=cusparse -o gtsv2acc.exe gtsv2acc.f90
./gtsv2acc.exe
 CREATE cusparseCreate_status:
 CUSPARSE_STATUS_SUCCESS
 Dgtsv STATUS:
 CUSPARSE_STATUS_SUCCESS
Failing in Thread:1
Accelerator Fatal Error: call to cuMemcpyDtoHAsync returned error 700: Illegal address during kernel execution
 File: /home/lixinyu/5555/5555/gtsv2acc.f90
 Function: tdma:1
 Line: 64

make: *** [makefile:15: run] Error 1

The result shows that the cusparseDgtsv2_nopivot function seems to have been called successfully, but I am still not sure how to transfer the result of the function to the host and print it.
If you have any other recommendation for the code, it would be helpful.
Looking forward your replies,thanks!

MatColgrove · November 9, 2023, 7:55pm

I believe the problem here is that “buffer” needs to be a device array.

To fix, create a device copy of buffer:

  allocate(buffer(bufferSizeInBytes))
!$acc data create(buffer)
  istat = cusparseDgtsv2_nopivot(handle, m, n, dl, d, du, B, ldb, buffer)
!$acc end data

or you can add the CUDA Fortran “device” attribute to the declaration of “buffer” and then add “-cuda” to your compilation flags. I prefer this method since there no need to have a host copy of this array, For example:

% grep "device " tdma.f90
  integer(1), pointer, device :: buffer(:)
% nvfortran -acc -cudalib=cusparse -fast tdma.f90 -cuda ; a.out
 CREATE cusparseCreate_status:
 CUSPARSE_STATUS_SUCCESS
 Dgtsv STATUS:
 CUSPARSE_STATUS_SUCCESS
 The solution is:
 SOL(            1 ):    0.000000000000000
 SOL(            2 ):    1.000000000000000
 SOL(            3 ):    0.000000000000000
 SOL(            4 ):    2.000000000000000
 SOL(            5 ):    0.000000000000000
 SOL(            6 ):    3.000000000000000
 SOL(            7 ):    0.000000000000000
 SOL(            8 ):    4.000000000000000
 SOL(            9 ):    0.000000000000000
 SOL(           10 ):    5.000000000000000
 SOL(           11 ):    0.000000000000000
 SOL(           12 ):    5.999999999999999
 SOL(           13 ):    0.000000000000000
 SOL(           14 ):    7.000000000000000
 SOL(           15 ):    0.000000000000000
 SOL(           16 ):    8.000000000000000
 SOL(           17 ):    0.000000000000000
 SOL(           18 ):    7.000000000000000
 SOL(           19 ):    0.000000000000000
 SOL(           20 ):    5.999999999999999
 SOL(           21 ):    0.000000000000000
 SOL(           22 ):    5.000000000000000
 SOL(           23 ):    0.000000000000000
 SOL(           24 ):    4.000000000000000
 SOL(           25 ):    0.000000000000000
 SOL(           26 ):    3.000000000000000
 SOL(           27 ):    0.000000000000000
 SOL(           28 ):    2.000000000000000
 SOL(           29 ):    0.000000000000000
 SOL(           30 ):    1.000000000000000
 SOL(           31 ):    0.000000000000000

Hope this helps,
Mat

lxyonline2887 · November 10, 2023, 1:23am

Hi, Mat!@MatColgrove
Thanks for your raply, I have corrected my program in the two ways you mentioned. It can run, but can’t output the correct result, only “NaN”.

nvfortran -Mpreprocess -fast -acc=gpu -cudalib=cusparse -cuda -o gtsv2acc.exe gtsv2acc.f90
./gtsv2acc.exe
 CREATE cusparseCreate_status:
 CUSPARSE_STATUS_SUCCESS
 Dgtsv STATUS:
 CUSPARSE_STATUS_SUCCESS
 The solution is:
 SOL(            1 ):                       NaN
 SOL(            2 ):                       NaN
 SOL(            3 ):                       NaN
 SOL(            4 ):                       NaN
 SOL(            5 ):                       NaN
 SOL(            6 ):                       NaN
 SOL(            7 ):                       NaN
 SOL(            8 ):                       NaN
 SOL(            9 ):                       NaN
 SOL(           10 ):                       NaN
 SOL(           11 ):                       NaN
 SOL(           12 ):                       NaN
 SOL(           13 ):                       NaN
 SOL(           14 ):                       NaN
 SOL(           15 ):                       NaN
 SOL(           16 ):                       NaN
 SOL(           17 ):                       NaN
 SOL(           18 ):                       NaN
 SOL(           19 ):                       NaN
 SOL(           20 ):                       NaN
 SOL(           21 ):                       NaN
 SOL(           22 ):                       NaN
 SOL(           23 ):                       NaN
 SOL(           24 ):                       NaN
 SOL(           25 ):                       NaN
 SOL(           26 ):                       NaN
 SOL(           27 ):                       NaN
 SOL(           28 ):                       NaN
 SOL(           29 ):                       NaN
 SOL(           30 ):                       NaN
 SOL(           31 ):                       NaN

Could you point out the mistake for me? Thank you so much!

MatColgrove · November 10, 2023, 6:24pm

Hmm, I went back and tried the code on a variety of devices and compiler versions, but all get what appear to be valid results, i.e. no NaNs.

What nvfortran version are using? What device and CUDA driver? What OS?

lxyonline2887 · November 11, 2023, 1:43am

@MatColgrove Thanks for your raply.
My nvfortran version is 23.3, cuda driver version is 525.85.05, OS is Ubuntu 20.04.2 LTS. GPU version is Tesla V100-SXM2-16GB, thanks.

lixinyu@featurize:~$ nvfortran -V
nvfortran 23.3-0 64-bit target on x86-64 Linux -tp skylake-avx512
NVIDIA Compilers and Tools
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

lixinyu@featurize:~$ nvidia-smi
Sat Nov 11 09:29:04 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.05    Driver Version: 525.85.05    CUDA Version: 12.0     |

lixinyu@featurize:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0

lxyonline2887 · November 12, 2023, 2:03am

@MatColgrove Hoooo so sorry Mat, I have found my mistake. One ‘! $acc’ was written by me incorrectly ‘! %acc’. When I corrected it, I was able to output the correct result.
So careless I am. I’m sorry to have wasted your time. Thank you so much!

MatColgrove · November 13, 2023, 4:09pm

No worries and my apologies, I fixed this early but then was focused on the issue with buffer so missed letting you know about it.

Topic		Replies	Views
Using cusparseDgtsv2_nopivot() with OpenACC in Fortran code GPU-Accelerated Libraries cuda , cusparse	2	518	November 9, 2023
A runtime error during use of cusparseDgtsv2 function in CUDA Fortran nvc, nvc++ and nvfortran cuda	8	558	October 19, 2023
How to solve a tridiagonal matrix using the cusparse<t>gtsv2_nopivot() functions in the cusparse library GPU-Accelerated Libraries cusparse	7	991	November 5, 2023
Some questions for the function "cusparseDcsrsv2_solve" nvc, nvc++ and nvfortran cuda	12	233	August 9, 2024
Unable to compile and use the cuSPARSE Dgtsv function in CUDA FORTRAN Legacy PGI Compilers	6	2256	July 6, 2019
cusparseDgstv2 fortran issue: uninitialized memory access and out of bounds access GPU-Accelerated Libraries cusparse	1	38	March 26, 2026
Nvlink error: Undefined reference to nvc, nvc++ and nvfortran	4	299	November 12, 2024
OpenACC with cuBLAS and cuSPARSE in Fortran code Legacy PGI Compilers	7	8577	February 22, 2016
How does the data transfer from the device to the host complete after cusparseSpMV () is executed GPU-Accelerated Libraries cusparse	3	421	January 28, 2024
cuSPARSE Library with OpenACC data Directives: cusparseDnVecGetValues not resolvable Legacy PGI Compilers cuda	6	923	October 11, 2021

Using cusparseDgtsv2_nopivot() with OpenACC in Fortran code

Related topics