I have the following code, which repeats a large number of calculations for every element in vector y and returns the results in vector z. The main program makes numerous calls to this subroutine, which compiles without error and appears to execute without a problem.
subroutine SingleGPU(nd, nx, ny, x, y, z)
use accel_lib
integer :: nd, nx, ny, i, j
real :: x(nd,nx), y(nd,ny), z(ny), v(nx), p(nx)
!$acc region do private(j, v, p)
do i = 1, ny
p = 1.0
do j = 1, nd
v = y(j,i) - x(j,1:nx)
p = p * ( .9375 * (1.0 - v**2)**2 * (abs(v) < 1.0) )
end do
z(i) = sum(p)
end do
!$acc end region
return
end subroutine SingleGPU
However, I have three C2050s and would like to use all of them. To spread the workload among multiple accelerators, I modified the code as follows.
subroutine MultiGPU(nd, nx, ny, x, y, z)
use accel_lib
use omp_lib
integer :: nd, nx, ny, i, ilo, ihi, j, ndevices
real :: x(nd,nx), y(nd,ny), z(ny), v(nx), p(nx)
ndevices = acc_get_num_devices(acc_device_nvidia)
!$omp parallel private(i, ilo, ihi, j, v, p, y, x) num_threads(ndevices)
call acc_set_device_num(omp_get_thread_num(), acc_device_nvidia)
ilo = omp_get_thread_num() * (ny/ndevices + 1) + 1
ihi = min(ny, ilo + (ny/ndevices) + 1) - 1)
!$acc region do private(j, v, p)
do i = ilo, ihi
p = 1.0
do j = 1, nd
v = y(j,i) - x(j,1:nx)
p = p * ( .9375 * (1.0 - v**2)**2 * (abs(v) < 1.0) )
end do
z(i) = sum(p)
end do
!$acc end region
!$omp end parallel
return
end subroutine MultiGPU
Within the accelerator region, the only difference between this and the first version of the code is the addition of the variables ilo and ihi to divide the workload among the available devices. I’ve checked omp_get_thread_num(), ilo, and ihi prior to entering the accelerator region. All are returning the expected values. This code compiles fine and appears to execute fine the first time it is called, but when called a second time it fails and returns the following message:
call to cuModuleGetFunction returned error 201: Invalid context
CUDA driver version: 3010
I’m at a loss. Can someone please help me understand what’s going on here?