How to use cusparseSgtsv2stridedbatch in cuda fortran

Hi,
I would like to use new version of tridiagonal solver CusparseSgtsv2stridedbatch in Cusparse. I have CUDA 10.1 and PGI compiler 19.5 installed. Then I compiled one of my fortran codes as follows
" pgf90 -c -Mcuda=cuda10.1 -Mcudalib=cusparse etauv_solver_gpu.f90 ",
However, the compiler said ‘cusparsesgtsv2stridedbatch, has not been explicitly declared (etauv_solver_gpu.f90)’. Though, using cusparseSgtsvStridedbatch was still OK. It seems that PGI fortran compiler has not recognized the CUDA 10.1 cusparse toolbox.
Do I need some C-Fortran binding codes to handle this?
Thank you.

Ye Yuan, UDel

Hi Ye,

Since the gtsv2 routines are new, we haven’t added them to our cuSparse interface module as of yet. We’ll work on that for a future release.

In the meantime, you’ll need to provide you’re own interface. You can follow the one we use for cuSparsesgtsvstridedbatch as a template: https://www.pgroup.com/resources/docs/19.4/x86/pgi-cuda-interfaces/index.htm#cusp-cusparsesgtsvstridedbatch

-Mat

Hi Mat,
Thanks for your advice.
I made an interface declaration in my MODULE code as follows,

interface integer (4) function cusparseSgtsv2StridedBatch(handle, m, dl, d, du, x, batchCount, batchStride,pBuffer) bind(C,name='cusparseSgtsv2StridedBatch')
        use cusparse
        implicit none
        type(cusparseHandle),value :: handle
        integer(4),value :: m, n, batchCount, batchStride
        real(4),dimension(*), device :: dl, d, du, x
        integer(4),value :: pBuffer
    end function cusparseSgtsv2StridedBatch
end interface

However, in turn I have the following compiling errs,

“parse pointers to void are invalid - use i8* instead
PGF90-F-0155-Compiler failed to translate accelerator region (see -Minfo messages).”

I think the problem is the pBuffer. pBuffer is the buffer size needed for gtsv2. I take it as integer as granted. For now I have no idea on how to deal with it.

Try changing it to:

integer(8),value :: pBuffer

Hi Mat,
I defined pBuffer as integer (8) in the interface. Now the compiling seems OK, however the execution with cuda-memcheck gave me

========= Invalid global write of size 4
========= at 0x000004d0 in void pcrGtsvBatchFirstPass(pcrGtsvBatchGlobalMemParams_t)
========= by thread (160,0,0) in block (0,181,0)
========= Address 0x00a2a400 is out of bounds
========= Device Frame:void pcrGtsvBatchFirstPass(pcrGtsvBatchGlobalMemParams_t) (void pcrGtsvBatchFirstPass(pcrGtsvBatchGlobalMemParams_t) : 0x4d0)

I did not have any errors when using gtsv, of course gtsv did not need the interface to interpret the variables. The difference between the calling of gtsv and gtsv2 is just the pBuffer.
The interface for gtsv2 is as follows,

 40 interface
  41     integer (4) function cusparseSgtsv2StridedBatch(handle, m, dl, d, du, x, batchCount, batchStride,pBuffer) bind(C,name='cusparseSgtsv2StridedBatch')
  42         use cusparse
  43         implicit none
  44         type(cusparseHandle),value :: handle
  45         integer(4),value :: m, n, batchCount, batchStride
  46         real(4),dimension(*), device :: dl, d, du, x
  47         integer(8),value :: pBuffer
  48     end function cusparseSgtsv2StridedBatch
  49     integer (4) function cusparseSgtsv2StridedBatch_bufferSizeExt(handle, m, dl, d, du, x, batchCount, batchStride,pBuffer) bind(C,name='cusparseSgtsv2StridedBatch_bufferSizeExt')
  50         use cusparse
  51         implicit none
  52         type(cusparseHandle),value :: handle
  53         integer(4),value :: m, n, batchCount, batchStride
  54         real(4),dimension(*), device :: dl, d, du, x
  55         integer(8),value :: pBuffer
  56     end function cusparseSgtsv2StridedBatch_bufferSizeExt
  57 end interface

Part of the calling codes are as follows:

real(SP), dimension(:),ALLOCATABLE, device :: Arow,Brow,Crow,Drow
        integer (8) :: buf
        integer :: MGlob, NGlob
        type(cusparseHandle),save :: cusparseh
        status = cusparseSgtsv2StridedBatch_bufferSizeExt(cusparseh, MGlob, Arow, Brow, Crow, Drow, NGlob, MGlob,buf)
        status = cusparseSgtsv2StridedBatch(cusparseh, MGlob, Arow, Brow, Crow, Drow, NGlob, MGlob,buf)

Hi Ye,

While I have not used gtsv2 myself, I think you’re missing the step to allocate the buffer. bufferSizeExt just returns the size of the buffer, but does not allocate the buffer.

Something like:

real(SP), dimension(:),ALLOCATABLE, device :: Arow,Brow,Crow,Drow
        type(C_DEVPTR) :: buf
        integer :: MGlob, NGlob
        integer(8) :: bufsize
        type(cusparseHandle),save :: cusparseh
        status = cusparseSgtsv2StridedBatch_bufferSizeExt(cusparseh, MGlob, Arow, Brow, Crow, Drow, NGlob, MGlob,bufsize)
        istat = cudaMalloc(buf, bufsize)
        status = cusparseSgtsv2StridedBatch(cusparseh, MGlob, Arow, Brow, Crow, Drow, NGlob, MGlob,buf)
...
        istat - cudaFree(buf)
...

-Mat

Hi Mat,
By studying several posts in the forum and stackflow, I realized that and make the modifications at the calling part as follows,

          integer(c_size_t) :: bufsize
          character(kind=c_char),device,allocatable,target :: buf(:)
          status = cusparseSgtsv2StridedBatch_bufferSizeExt(cusparseh, MGlob, Arow, Brow, Crow, Drow, NGlob, MGlob,bufsize)
          allocate(buf(bufsize))
          status = cusparseSgtsv2StridedBatch(cusparseh, MGlob, Arow, Brow, Crow, Drow, NGlob, MGlob,c_loc(buf))

The interfaces are:

 40 interface
  41     integer(4) function cusparseSgtsv2StridedBatch(handle, m, dl, d, du, x, batchCount, batchStride,pBuffer) bind(C,name='cusparseSgtsv2StridedBatch')
  42         use iso_c_binding
  43         use cusparse
  44         implicit none
  45         type(cusparseHandle),value :: handle
  46         integer(4),value :: m, batchCount, batchStride
  47         real(4),dimension(*), device :: dl, d, du, x
  48         type(c_ptr), value :: pBuffer
  49         !character(c_char),device ::pBuffer(*)
  50     end function cusparseSgtsv2StridedBatch
  51     integer(4) function cusparseSgtsv2StridedBatch_bufferSizeExt(handle, m, dl, d, du, x, batchCount, batchStride,pBufferSize) bind(C,name='cusparseSgtsv2StridedBatch_bufferSizeExt')
  52         use iso_c_binding
  53         use cusparse
  54         implicit none
  55         type(cusparseHandle),value :: handle
  56         integer(4),value :: m, batchCount, batchStride
  57         real(4),dimension(*), device :: dl, d, du, x
  58         integer(c_size_t),value :: pBufferSize
  59     end function cusparseSgtsv2StridedBatch_bufferSizeExt
  60 end interface

Now I had the Segmentation Fault. By using cuda-memcheck, there was no useful information. Now I decide to give up, and wait for the next PGI update. Hopefully all tridiagonal solvers can be included.