How to switch devices to run different cufft on diff devices

I am trying to run following fortran code:

     do thrdnm=0,ngpus-1
        call acc_set_device_num(mod(thrdnm,ngpus),acc_device_nvidia)
        !$acc enter data create(guv)
      enddo

      do iv=1,natv 
        thrdnm = mod(iv-1,ngpus)
        call acc_set_device_num(thrdnm,acc_device_nvidia)
        !$acc update device(guv(1:lgk,iv))
        call  rlft3i (guv(1:lgk,iv), ng3)
        !$acc update self(guv(1:lgk,iv))
      enddo

      do thrdnm=0,ngpus-1
        call acc_set_device_num(mod(thrdnm,ngpus),acc_device_nvidia)
        !$acc exit data delete(guv)
      enddo

where rlft3i subroutine calls cufft library:


      subroutine  rlft3i (fdata, ng3, key)
      use cufft
      use openacc
      implicit none
      integer :: ng3(3)
      integer :: key
      real(4), dimension ((ng3(1)+2)*ng3(2)*ng3(3)) :: fdata

      integer :: ig,ig2, i,j,k, ngr,ngk, nx,ny,nz
      integer(4) ::  ierr
      real, dimension ((ng3(1)+2)*ng3(2)*ng3(3)) :: work

      ngr =  ng3(1)*ng3(2)*ng3(3)
      ngk =  (ng3(1)+2)*ng3(2)*ng3(3)

      nx = ng3(1)
      ny = ng3(2)
      nz = ng3(3)

      if (key == 1) then
          !$acc data create(work) present(fdata)
          !$acc host_data use_device(fdata,work)
          ierr = cufftExecR2C(plan_forward,fdata,work)
          !$acc end host_data

          !$acc kernels present(fdata,work)
          fdata(1:ngk) = work(1:ngk)/ngr
          !$acc end kernels
          !$acc end data
          return 
      endif

      if (key == -1) then
          !$acc data create(work) present(fdata)
          !$acc host_data use_device(fdata,work)
          ierr = cufftExecC2R(plan_backward,fdata,work)
          !$acc end host_data
          !$acc kernels present(fdata,work)
          fdata(1:ngk) = work(1:ngk)
          !$acc end kernels
          !$acc end data
          return 
      endif
      end subroutine

the problem is that the acc_set_device_num does not switch devices for cufft library and the program fails. However, when I comment calls of the acc_set_device_num subroutines, the code works. Can I somehow switch devices for the cufft library from within cufft fortran interface? If not, maybe I could switch them using some CUDA function?

Hi sstepanhlushak68578,

You’ll need to add calls to “cudaSetDevice” in addition to calling “acc_set_device_num” to get the CUDA runtime and cuFFT to use the same device. Currently, “acc_set_device_num” only sets the device for the OpenACC runtime.

You’ll want to add “use cudafor” to get the cudaSetDevice Fortran interface and compile with “-Mcuda” as well.

Let me know how it goes!

Thanks,
Mat

Hi Mat,

Unfortunately, it failed again.
So, in detail, I have changed me code to

      do thrdnm=0,ngpus-1
        call acc_set_device_num(mod(thrdnm,ngpus),acc_device_nvidia)
        !$acc enter data create(guv)
      enddo
      do iv=1,natv 
        thrdnm = mod(iv-1,ngpus)
        call acc_set_device_num(thrdnm,acc_device_nvidia)
        istat = cudaSetDevice(thrdnm)
        if(.not.istat==CUDASUCCESS)then
          print *,"thrdnm:",thrdnm," cudaSetDevice ERROR: istat=",istat
        endif
        !$acc update device(guv(1:lgk,iv))
        call  rlft3i (guv(1:lgk,iv), ng3, 1)
        !$acc update self(guv(1:lgk,iv))
      enddo
      do thrdnm=0,ngpus-1
        call acc_set_device_num(mod(thrdnm,ngpus),acc_device_nvidia)
        !$acc exit data delete(guv)
      enddo

and it failed saying that

Failing in Thread:1
call to cuStreamSynchronize returned error 700: Illegal address during kernel execution
Failing in Thread:1
call to cuMemFreeHost returned error 700: Illegal address during kernel execution

I am compiling this with pgi/17.3 version of the compiler.

I am running this on a single node of a supercomputer with 4 tesla cards and 24 cpus.

Any ideas what else I could try? Or maybe I have error somewhere else? On the other hand if comment the acc_set_device_num function, then it seems to work correctly.

Thanks

I think something else is going on.

Where do you create the cuFFT plan? Are you setting the device at this point as well?

I personally use MPI for multiple GPU programming so have not tried toggling back and forth between multiple GPUs in the same code. Hence, if this is not the issue, can you send a reproducing example to PGI Customer Service (trs@pgroup.com) and ask them to send it to me so I can take a look?

Thanks,
Mat

Hi Mat.

The code is slightly larger, so, I will have to prepare a separate example.

So, the plan is setup initially at the initialization of the code. The plan is setup only once and I do not setup any device there.

So, in my understanding it might be that the information on the device to use might be stored in the plan? So, I should probably prepare several plans for different devices and store them separately?

Thanks

When you create a cuFFT plan, you’re actually allocating configuration data on the device that the handle points to. So yes, you will need different plans for each device and make sure the device number is set before creating the plan.

Dear Mat and mkcolg,

I have created separate cufft plan for every gpu and slightly modified the code and now it seems to fine.

Thanks