How to switch devices to run different cufft on diff devices

sstepanhlushak68578 · December 8, 2017, 1:43am

I am trying to run following fortran code:

     do thrdnm=0,ngpus-1
        call acc_set_device_num(mod(thrdnm,ngpus),acc_device_nvidia)
        !$acc enter data create(guv)
      enddo

      do iv=1,natv 
        thrdnm = mod(iv-1,ngpus)
        call acc_set_device_num(thrdnm,acc_device_nvidia)
        !$acc update device(guv(1:lgk,iv))
        call  rlft3i (guv(1:lgk,iv), ng3)
        !$acc update self(guv(1:lgk,iv))
      enddo

      do thrdnm=0,ngpus-1
        call acc_set_device_num(mod(thrdnm,ngpus),acc_device_nvidia)
        !$acc exit data delete(guv)
      enddo

where rlft3i subroutine calls cufft library:

      subroutine  rlft3i (fdata, ng3, key)
      use cufft
      use openacc
      implicit none
      integer :: ng3(3)
      integer :: key
      real(4), dimension ((ng3(1)+2)*ng3(2)*ng3(3)) :: fdata

      integer :: ig,ig2, i,j,k, ngr,ngk, nx,ny,nz
      integer(4) ::  ierr
      real, dimension ((ng3(1)+2)*ng3(2)*ng3(3)) :: work

      ngr =  ng3(1)*ng3(2)*ng3(3)
      ngk =  (ng3(1)+2)*ng3(2)*ng3(3)

      nx = ng3(1)
      ny = ng3(2)
      nz = ng3(3)

      if (key == 1) then
          !$acc data create(work) present(fdata)
          !$acc host_data use_device(fdata,work)
          ierr = cufftExecR2C(plan_forward,fdata,work)
          !$acc end host_data

          !$acc kernels present(fdata,work)
          fdata(1:ngk) = work(1:ngk)/ngr
          !$acc end kernels
          !$acc end data
          return 
      endif

      if (key == -1) then
          !$acc data create(work) present(fdata)
          !$acc host_data use_device(fdata,work)
          ierr = cufftExecC2R(plan_backward,fdata,work)
          !$acc end host_data
          !$acc kernels present(fdata,work)
          fdata(1:ngk) = work(1:ngk)
          !$acc end kernels
          !$acc end data
          return 
      endif
      end subroutine

the problem is that the acc_set_device_num does not switch devices for cufft library and the program fails. However, when I comment calls of the acc_set_device_num subroutines, the code works. Can I somehow switch devices for the cufft library from within cufft fortran interface? If not, maybe I could switch them using some CUDA function?

MatColgrove · December 8, 2017, 4:26pm

Hi sstepanhlushak68578,

You’ll need to add calls to “cudaSetDevice” in addition to calling “acc_set_device_num” to get the CUDA runtime and cuFFT to use the same device. Currently, “acc_set_device_num” only sets the device for the OpenACC runtime.

You’ll want to add “use cudafor” to get the cudaSetDevice Fortran interface and compile with “-Mcuda” as well.

Let me know how it goes!

Thanks,
Mat

sstepanhlushak68578 · December 9, 2017, 4:57am

Hi Mat,

Unfortunately, it failed again.
So, in detail, I have changed me code to

      do thrdnm=0,ngpus-1
        call acc_set_device_num(mod(thrdnm,ngpus),acc_device_nvidia)
        !$acc enter data create(guv)
      enddo
      do iv=1,natv 
        thrdnm = mod(iv-1,ngpus)
        call acc_set_device_num(thrdnm,acc_device_nvidia)
        istat = cudaSetDevice(thrdnm)
        if(.not.istat==CUDASUCCESS)then
          print *,"thrdnm:",thrdnm," cudaSetDevice ERROR: istat=",istat
        endif
        !$acc update device(guv(1:lgk,iv))
        call  rlft3i (guv(1:lgk,iv), ng3, 1)
        !$acc update self(guv(1:lgk,iv))
      enddo
      do thrdnm=0,ngpus-1
        call acc_set_device_num(mod(thrdnm,ngpus),acc_device_nvidia)
        !$acc exit data delete(guv)
      enddo

and it failed saying that

Failing in Thread:1
call to cuStreamSynchronize returned error 700: Illegal address during kernel execution
Failing in Thread:1
call to cuMemFreeHost returned error 700: Illegal address during kernel execution

I am compiling this with pgi/17.3 version of the compiler.

I am running this on a single node of a supercomputer with 4 tesla cards and 24 cpus.

Any ideas what else I could try? Or maybe I have error somewhere else? On the other hand if comment the acc_set_device_num function, then it seems to work correctly.

Thanks

MatColgrove · December 11, 2017, 4:36pm

I think something else is going on.

Where do you create the cuFFT plan? Are you setting the device at this point as well?

I personally use MPI for multiple GPU programming so have not tried toggling back and forth between multiple GPUs in the same code. Hence, if this is not the issue, can you send a reproducing example to PGI Customer Service (trs@pgroup.com) and ask them to send it to me so I can take a look?

Thanks,
Mat

sstepanhlushak68578 · December 11, 2017, 6:51pm

Hi Mat.

The code is slightly larger, so, I will have to prepare a separate example.

So, the plan is setup initially at the initialization of the code. The plan is setup only once and I do not setup any device there.

So, in my understanding it might be that the information on the device to use might be stored in the plan? So, I should probably prepare several plans for different devices and store them separately?

Thanks

MatColgrove · December 11, 2017, 7:12pm

When you create a cuFFT plan, you’re actually allocating configuration data on the device that the handle points to. So yes, you will need different plans for each device and make sure the device number is set before creating the plan.

sstepanhlushak68578 · December 13, 2017, 7:43pm

Dear Mat and mkcolg,

I have created separate cufft plan for every gpu and slightly modified the code and now it seems to fine.

Thanks

Topic		Replies	Views
openacc fortran & cufft problem Legacy PGI Compilers	1	1562	May 9, 2018
Cufft test failing when selecting another device Legacy PGI Compilers	2	2603	July 27, 2011
cudaSetDevice failing Legacy PGI Compilers	5	7366	December 11, 2018
-Mcuda Option and ACC_DEVICE_NUM-Environment Legacy PGI Compilers	3	6338	May 21, 2015
Device number selection in CUDA code called from OpenACC code that uses "set device_num" nvc, nvc++ and nvfortran	1	595	February 1, 2022
Specifying which GPU Card Fortran Program Should Use Legacy PGI Compilers	1	8153	July 17, 2009
CUDA & OpenACC interoperability: Device selection Legacy PGI Compilers	1	3948	July 6, 2017
Modified version of the CUFFT example Legacy PGI Compilers	6	7746	February 21, 2012
acc_set_device_num question Legacy PGI Compilers	4	3763	September 30, 2015
how to assign device number? Legacy PGI Compilers	3	3238	June 22, 2011

How to switch devices to run different cufft on diff devices

Related topics