error 709: Context is destroyed or not yet created - pgi13.3

Hi,

We have a MPI code which is mixing part in Fortran and part in Cuda. Before the MPI init we need to set the device by calling cudaSetDevice.
We then also call acc_set_device_num to set the device for the directives (it seems to be necessary to call both).

This used to work fine with pgi12.10, however with 13.3 I am getting an error at runtime.

I was able to reproduce the problem in a simple code (wihtout mpi):

!test setdevice

program main 
  use openacc
  implicit none 
  integer :: ndev, mydev,ierr
  enum, bind(C) !:: cudaError
     enumerator :: cudaSuccess=0
  end enum ! cudaError

interface ! [['cudaError_t', None], 'cudaSetDevice', [['int', None, 'device']]]
  function cudaSetDevice(device) result( res ) bind(C, name="cudaSetDevice")
    use, intrinsic :: ISO_C_BINDING
    import cudaSuccess
    implicit none
    integer(c_int), value :: device
    integer (KIND(cudaSuccess)) :: res
  end function cudaSetDevice
end interface

  mydev=0
  ierr = cudaSetDevice(mydev)
  if (ierr>0) print*, 'Error with cudaSetDevice'

  ndev=acc_get_num_devices(acc_device_nvidia)
  print*, 'ndev=',ndev
  call acc_set_device_num(mydev,acc_device_nvidia)

  print*, 'devid=', mydev

end program main

It is compiled as follow:

pgf90 -ta=nvidia -acc -o test_setdevice test_setdevice.f90 -L$CUDALIB -lcudart -lcuda

With 12.10 I get:

mpiexec -n 1 ./test_setdevice
 ndev=            2
 devid=            0

With 13.3:

 mpiexec -n 1 ./test_setdevice
call to cuMemAlloc returned error 709: Context is destroyed or not yet created

Any idea why this could work with 12.10 and not anymore with 13.3 ?

Thanks,

Xavier

Hi Xavier,

I’m wondering if it’s a mismatch between CUDA versions. It seems to work for me when using 13.1 and CUDA 5.0 (12.10 uses CUDA 4.1 by default). I’ll need to install an earlier version of CUDA to see if I can reproduce your error but am short on time today. Which CUDA version are you using?

% pgfortran -ta=nvidia,5.0 -acc -o test_setdevice test_setdevice.f90 -V13.1 -L/opt/cuda-5.0/lib64 -lcudart -lcuda
% mpirun -n 1 a.out                                                                                         ndev=            2
 devid=            0
  • Mat

Hi Mat,

I’ve just tried with CUDA 5

pgf90 -ta=nvidia,5.0 -acc -o test_setdevice test_setdevice.f90 -L/apps/castor/CUDA-5.0/lib64 -lcudart -lcuda

and I still get the error. Note that I am using 13.3.

I’ve tried with 13.2 and 13.1 and indeed it works for these older versions.

Xavier

Hi Xavier,

I was able to reproduce the issue in 13.3. I’m not entirely sure what’s going on here since if you link with “-Mcuda”, it works fine. I’ve added at problem report (TPR#19236) and sent it on for investigation.

  • Mat
% pgfortran -ta=nvidia,5.0 -acc -o test_setdevice test_setdevice.f90 -V13.3 -L/opt/cuda-5.0/lib64 -lcudart -lcuda 
% test_setdevice
 call to cuMemAlloc returned error 709: Context is destroyed or not yet created
% pgfortran -ta=nvidia,5.0 -acc -o test_setdevice test_setdevice.f90 -V13.3 -L/opt/cuda-5.0/lib64 -lcudart -lcuda -Mcuda
% test_setdevice
  ndev=            2
 devid=            0