cufftSetStream interface for Cuda fortran

Hello everyone,

I am trying to use the cufftSetStream(plan,stream) command on a hybrid MPI Cuda fortran code. On the host I am defining the variables as

integer :: plan
integer :: stream

and my interface is

interface cufftSetStream
       integer function cufftSetStream(plan,stream) bind(C,name='cufftSetStream')
         use iso_c_binding
         integer(c_int), value:: plan
         integer(c_int)        :: stream
       end function cufftSetStream
     end interface cufftSetStream

My logic here ( I don’t know if it’s right) is each cpu to create a cuda stream, copy its data to the device, create the plan, connect the plan with the stream and run the fft. Here is the block that is executed from each cpu:

 istat=cudastreamcreate(stream)

  !send data to device for each cpu

  istat=cudamemcpyasync(val_d,val,proc%nxyz(1)*proc%nxyz(2)*nz,cudaMemcpyHostToDevice,stream)

  ! allocate arrays where fft and ifft will be performed 1-d
   allocate(a_d(proc%nxyz(1)*proc%nxyz(2)*nz),b_d(proc%nxyz(1)*proc%nxyz(2)*nz))


    !specify threads on each dimension of the block
    block=dim3(32,32,1)
    
    !specify numbers of blocks on each dimension of thegrid
    grid=dim3(ceiling(real(proc%nxyz(1))/block%x),ceiling(real(proc%nxyz(2))/block%y),ceiling(real(nz)/block%z))

    !map on 1-d array for the fft                
    call threedim_2_onedim<<<grid,block,0,stream>>>(val_d,a_d,proc%nxyz(1),proc%nxyz(2),nz) 

    !initialize plan
    istat=cufftPlan1D(plan,nz,CUFFT_D2Z,proc%nxyz(1)*proc%nxyz(2))
    istat=cufftSetStream(plan,stream)
    
    istat=cufftPlan1D(plan2,nz,CUFFT_Z2D,proc%nxyz(1)*proc%nxyz(2))
    istat=cufftSetStream(plan2,stream)
    
   
    ! Execute FFTs
    istat=cufftExecD2Z(plan,a_d,b_d,CUFFT_FORWARD)
    if(istat.ne.0) write(*,*) 'problem with forward fft'
    
    istat=cufftExecZ2D(plan2,b_d,a_d,CUFFT_INVERSE)
    if(istat.ne.0) write(*,*) 'problem with inverse fft'


    istat=cufftDestroy(plan)

    istat=cufftDestroy(plan2)

    !bring processed data back to 3-d form
    call onedim_2_threedim <<<grid,block,0,stream>>>(val_d,a_d,proc%nxyz(1),proc%nxyz(2),nz)

  !get data back to host
istat=cudamemcpyasync(val,val_d,proc%nxyz(1)*proc%nxyz(2)*nz,cudaMemcpyDeviceToHost,stream)

The problem is that I am getting the following errors and I can’t understand why as I am new to Cuda.

[r5u25n1:56095] *** Process received signal ***
[r5u25n1:56095] Signal: Segmentation fault (11)
[r5u25n1:56095] Signal code: (128)
[r5u25n1:56095] Failing at address: (nil)
[r5u25n1:56095] [ 0] /lib64/libpthread.so.0(+0xf630)[0x7f103579c630]
[r5u25n1:56095] [ 1] /usr/lib64/libcuda.so.1(+0x1cb9cd)[0x7f1029ca79cd]
[r5u25n1:56095] [ 2] /usr/lib64/libcuda.so.1(+0x17c0a1)[0x7f1029c580a1]
[r5u25n1:56095] [ 3] /usr/lib64/libcuda.so.1(cuLaunchKernel+0x83)[0x7f1029cfeb93]
[r5u25n1:56095] [ 4] /opt/ohpc/pub/apps/cuda/11.0/Linux_x86_64/20.7/math_libs/11.0/lib64/libcufft.so.10(+0x24853b)[0x7f1050f4753b]
[r5u25n1:56095] [ 5] /opt/ohpc/pub/apps/cuda/11.0/Linux_x86_64/20.7/math_libs/11.0/lib64/libcufft.so.10(+0x28a306)[0x7f1050f89306]
[r5u25n1:56095] [ 6] /opt/ohpc/pub/apps/cuda/11.0/Linux_x86_64/20.7/math_libs/11.0/lib64/libcufft.so.10(+0x22797f)[0x7f1050f2697f]
[r5u25n1:56095] [ 7] /opt/ohpc/pub/apps/cuda/11.0/Linux_x86_64/20.7/math_libs/11.0/lib64/libcufft.so.10(+0xd9a2d)[0x7f1050dd8a2d]
[r5u25n1:56095] [ 8] /opt/ohpc/pub/apps/cuda/11.0/Linux_x86_64/20.7/math_libs/11.0/lib64/libcufft.so.10(+0xedef7)[0x7f1050decef7]
[r5u25n1:56095] [ 9] /opt/ohpc/pub/apps/cuda/11.0/Linux_x86_64/20.7/math_libs/11.0/lib64/libcufft.so.10(+0xee24d)[0x7f1050ded24d]
[r5u25n1:56095] [10] /opt/ohpc/pub/apps/cuda/11.0/Linux_x86_64/20.7/math_libs/11.0/lib64/libcufft.so.10(+0x54a1f)[0x7f1050d53a1f]
[r5u25n1:56095] [11] /opt/ohpc/pub/apps/cuda/11.0/Linux_x86_64/20.7/math_libs/11.0/lib64/libcufft.so.10(+0x54aba)[0x7f1050d53aba]
[r5u25n1:56095] [12] /opt/ohpc/pub/apps/cuda/11.0/Linux_x86_64/20.7/math_libs/11.0/lib64/libcufft.so.10(+0x5400c)[0x7f1050d5300c]
[r5u25n1:56095] [13] /opt/ohpc/pub/apps/cuda/11.0/Linux_x86_64/20.7/math_libs/11.0/lib64/libcufft.so.10(+0x40a4a)[0x7f1050d3fa4a]
[r5u25n1:56095] [14] /opt/ohpc/pub/apps/cuda/11.0/Linux_x86_64/20.7/math_libs/11.0/lib64/libcufft.so.10(+0x40bf0)[0x7f1050d3fbf0]
[r5u25n1:56095] [15] /opt/ohpc/pub/apps/cuda/11.0/Linux_x86_64/20.7/math_libs/11.0/lib64/libcufft.so.10(cufftExecD2Z+0x72)[0x7f1050d4ed52]
[r5u25n1:56095] [16] ./test[0x403cc3]
[r5u25n1:56095] [17] ./test[0x402613]
[r5u25n1:56095] *** End of error message ***

Could you please assist me on that issue?
Thank you in advance
VT

Hi VT,

I’m thinking that the problem may be because a stream is a pointer, not an int, so the value you’re passing in to cufftSetStream is then causing the segfault.

Here’s the interface that’s used when including the provided “use cufft” module:

 integer(4) function cufftSetStream(plan, stream)
   integer :: plan
   integer(kind=cuda_stream_kind()) :: stream 

https://docs.nvidia.com/hpc-sdk/compilers/fortran-cuda-interfaces/index.html#fft-cufftsetstream

While this example is for cuTensor, it does show how streams are created and used:

https://docs.nvidia.com/hpc-sdk/compilers/fortran-cuda-interfaces/index.html#cflib-tensor-cuf2-host

-Mat

1 Like

Thank you Mat for your response. Now I am declaring the stream as

integer(kind=cuda_stream_kind()) :: stream

and my cufft interface is

interface cufftSetStream
integer function cufftSetStream(plan,stream) bind(C,name=‘cufftSetStream’)
use iso_c_binding
integer(c_int), value:: plan
integer(c_ptr) :: stream
end function cufftSetStream
end interface cufftSetStream

but when I am trying to compile my code I am getting the following errors:

NVFORTRAN-S-0087-Non-constant expression where constant expression required (cufft.cuf: 78)
0 inform, 0 warnings, 1 severes, 0 fatal for cufft
NVFORTRAN-S-0075-Subscript, substring, or argument illegal in this context for ‘constant’ (main.f90: 30)
0 inform, 0 warnings, 1 severes, 0 fatal for main2

Also I checked on the cuda fortran cufft module coming with the compiler but the interface for the cufftSetStream function is not provided that’s why I have to write it myself. Could you please provide me with a cufft module that includes the cufftSetStream interface or assist me on how to declare the stream both on the host and the interface in case I have to write it myself.

I really appreciate your help here
VT

Hi VT,

It might be a typo in our docs. “cuda_stream_kind” shouldn’t have the parens “()” at the end. Does it compile if you remove those?

Note, I double checked, but it appears to me that cufftSetStream is included in the module:

% grep -i setstream cufft.mod
S 939 19 0 0 0 9 1 624 6532 4000 0 A 0 0 0 0 B 0 413 0 0 0 0 0 0 0 940 0 0 0 0 0 0 29 1 0 0 0 0 0 624 0 0 0 0 cufftsetstream
S 940 14 5 0 0 6 1 624 6532 4 18000 A 1000000 0 0 0 B 0 0 0 0 0 0 0 164 2 939 0 944 0 0 0 0 0 0 0 0 0 414 0 624 0 0 943 0 cufftsetstream cufftsetstream cufftsetstream
S 944 1 3 0 0 6 1 940 6532 2004 1002000 A 0 0 0 0 B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 cufftsetstream

-Mat

Hi Mat,

When I compile without the parenthesis it compiles properly but I still have issues with the module file. I have only the source code which I compiled and tried to add the interface, where that interface is missing. Is there a way to find the complete source code of the cufft module?

Thank you
VT

Hi VT,

The cufft module is documented in the Fortran CUDA Interfaces: NVIDIA Fortran CUDA Library Interfaces Version 21.2 for ARM, OpenPower, x86

Here’s a small example that I modified from the docs. If you’re still having issues, please post a small reproducer and we can determine what the issue is.

% cat cufft.cuf
program cufft2dTest
  use cufft
  use cudafor
  integer, parameter :: m=768, n=512
  complex, allocatable  :: a(:,:), b(:,:), c(:,:)
  complex, device, allocatable  :: a_d(:,:),b_d(:,:),c_d(:,:)
  integer :: iplan1, ierr
  integer(kind=cuda_stream_kind) :: mystream

  allocate(a(m,n),b(m,n),c(m,n))
  allocate(a_d(m,n),b_d(m,n),c_d(m,n))

  a=1
  a_d=a
  ierr = cudaStreamCreate(mystream)
  ierr = cufftPlan2D(iplan1,m,n,CUFFT_C2C)
  ierr = ierr + cufftSetStream(iplan1,mystream)
  ierr = ierr + cufftExecC2C(iplan1,a_d,b_d,CUFFT_FORWARD)
  ierr = ierr + cufftExecC2C(iplan1,b_d,c_d,CUFFT_INVERSE)
  b=d_d
  c=c_d
  ! scale c
  c = c / (m*n)

  ! Check forward answer
  write(*,*) 'Max error C2C FWD: ', cmplx(maxval(real(b)) - sum(real(b)), &
                                          maxval(imag(b)))
  ! Check inverse answer
  write(*,*) 'Max error C2C INV: ', maxval(abs(a-c))

  ierr = ierr + cufftDestroy(iplan1)

  if (ierr.eq.0) then
    print *,"test PASSED"
  else
    print *,"test FAILED"
  endif

end program cufft2dTest
% nvfortran cufft.cuf -cudalib=cufft; a.out
 Max error C2C FWD:   (0.000000,0.000000)
 Max error C2C INV:     0.000000
 test PASSED

-Mat

Hi Mat,

After some trial and error I was able to make my code work

Thank you
VT

That’s good VT. Can you describe what the issues were in case it might help others?

-Mat

On the host code I set

integer:: stream

and on the interface

integer (c_int):: stream

as before. The issue was with the compiler on an old HPC system