Issues using cusolver with fortran

Hello everyone,
I am trying to use cusolver with Cuda Fortran in order to perform a LU decomposition and finally solve a system of equations. The relevant part of the code is the following.

subroutine gpu_LU_solver_with_pivot(AA,BB,XX)
    implicit none
    double precision, pinned, intent(in):: AA(proc%nijk(1),proc%nijk(2)),BB(proc%nijk(2))
    double precision, pinned,intent(out):: XX(proc%nijk(2))
    double precision, device:: AA_d(proc%nijk(1),proc%nijk(2)),BB_d(proc%nijk(2))
    double precision, device:: XX_d(proc%nijk(2))
    double precision, allocatable, device:: workspace_d(:)
    integer:: istat,work,i,j,idx
    integer, device:: info_d,ny_d
    integer, device:: pivot_d(proc%nijk(2))

    istat=cudamemcpyasync(AA_d,AA,proc%nijk(1)*proc%nijk(2),cudaMemcpyHostToDevice,stream)

    istat=cudamemcpyasync(BB_d,BB,proc%nijk(2),cudaMemcpyHostToDevice,stream)
    if(istat.ne.0) write(*,*) 'problem with copying B'

    istat=cusolverDnDgetrf_bufferSize(cusolverhndl,proc%nijk(1),proc%nijk(2),AA_d,proc%nijk(1),work)
    if(istat.ne.0) write(*,*) 'proble with getting buffer size'

    allocate(workspace_d(work))

    istat=cusolverDnDgetrf(cusolverhndl,proc%nijk(1),proc%nijk(2),AA_d,proc%nijk(1),workspace_d,pivot_d,info_d)
    !if(istat.ne.0) write(*,*) 'problem with dndgetrf--LU factorization'


  end subroutine gpu_LU_solver_with_pivot

Also the interface for the cusolver functions:

interface cusolverDnDgetrf_bufferSize
    integer function cusolverDnDgetrf_bufferSize(cusolverhndl,rows,columns,A,lda,work) bind(C,name='cusolverDnDgetrf_bufferSize')
      use iso_c_binding
      type(c_ptr), value:: cusolverhndl
      integer(c_int), value:: rows,columns,lda
      integer(c_int):: work
      double precision, device:: A(*)
    end function cusolverDnDgetrf_bufferSize
  end interface cusolverDnDgetrf_bufferSize
  !
  interface cusolverDnDgetrf
    integer function cusolverDnDgetrf(cusolverhndl,rows,columns,A,lda,workspace,pivot,info) bind(C,name='cusolverDnDgetrf')
      use iso_c_binding
      type(c_ptr), value:: cusolverhndl
      integer(c_int), value:: rows, columns, lda 
      double precision, device:: A(*),workspace(*)
      integer, device:: pivot(*), info
    end function cusolverDnDgetrf
  end interface cusolverDnDgetrf

Everything works fine except the call for cusolverDnDgetrf which gives a segmentation fault error. I suspect there is an issue with the arguments of that call but I am not entirely sure how to locate that as the code compiles without any issues. I would appreciate your help.

Thank you in advance
vtsakag

I don’t really see an obvious problem here. Your interfaces are a little different than what we provide in our cusolverDn module, but I suspect they should work fine. Specifically, you don’t have the setting of the handle type or the stream in this code snippet, but they are probably not causing the problem. What sizes are you solving, proc%nijk(1) and proc%nijk(2)? It would be interesting to know that, and what the buffer requirements you get from the buffersize call are. Also, the pivot array is sized based on proc%nijk(2). I usually assume the pivot array is the same as the leading dimension of A, but that might not be the problem here either. Sometimes the libraries have environment variables which will provide more debug info. I’ll try to see if there is something like that you can set.

bleblack, thank you for your response.

proc%nijk(1)=100 and proc%nijk(2)=50 for that problem, leading to a buffer requirement of 4512. Could you please let me know where I can find the module cusolverDn in order to eliminate any possibilities that the problem is because of that? Also I would appreciate if you could inform me about the environmental variables so I can debug my application.

Thank you for your time
vtsakag

Our cusolverdn module is a part of our NVIDIA HPC SDK package. The compilers will just find it, and it is pre-compiled. It is documented here: NVIDIA Fortran CUDA Library Interfaces Version 21.5 for ARM, OpenPower, x86 See chapter 6. It turns out the environment variables are not available yet, they are working on that for a future release.

Thank you bleblak that was helpful but I have one last question. I assign a stream on
cusolverhndl in such a way that my code runs in parallel with MPI and each CPU has its own stream with each GPU. Will this assign each call of the cusloverDnDgetrf on the same stream or should I specify the stream as an argument on the call?

Thank you
vtsakag

There is a cusolverDnSetStream() function that you should use, to get cusolver to run on the correct stream.