Hi,
I am having issues when attempting to use the mirror and update device clauses with multiple GPUs. It seems as though only the first GPU is aware of the data that was reflected in a previous routine. This is also true of the initialisation of the second GPU.
More details:
I was recently getting the error: “Fatal Usage Error: __pgi_cu_mirrordealloc called before __pgi_cu_init” at execution time. When I remove the deallocation (which isn’t strictly needed) then I got the similar error: “Fatal Usage Error: __pgi_cu_mirroralloc called before __pgi_cu_init”. This is associated with the allocation of an array that is updated using the !$acc update(passed2) clause after being defined as mirrored in a separate module and the problem only occurs now that I am trying to run the code across two OpenMP threads.
Further tweaking showed that despite !$acc_init getting called in an OpenMP region within an earlier subroutine this doesn’t seem to have been passed on to this routine. Adding an !$acc_init to this routine has removed the error described above but replaced
it with the following error at compile time:
PGF90-S-0155-UPDATE clause requires a visible device copy for symbol passed2 (intega.f: 27998)
This error actually seems to be related to the specification of the passed2 array as being private for the OpenMP region.
Thanks for taking a look,
Karl
Hi Karl,
I’ve used mirrored in an OpenMP program but the variable needs to private and only allocated after the program has entered a parallel region. A mirrored shared variable isn’t yet supported.
Though, I have never seen the specific errors you’re getting. Can you write a small reproducing example?
Thanks,
Mat
Hi Mat,
I haven’t been able to replicate the issue within a smaller piece of sample code I’m afraid.
I recently tried to bypass the issue by moving the code into a separate routine that is called from within the OpenMP region.
However, this results in some behaviour I would consider quite strange: My understanding is that the variables within a subroutine that is called from an OpenMP region are intrinsically private (unless specified otherwise). Unfortunately this does not seem to be the case as I am getting errors that can be corrected by specifying the relevant variables as private.
Am I missing something simple here?
Cheers,
Karl
Hi Karl,
My understanding is that the variables within a subroutine that is called from an OpenMP region are intrinsically private
Correct. Variables declared locally within a subroutine are implicitly private if the subroutine is called within an OpenMP parallel region. Hence, I suspect something else is going on so would need more details.
Below is a small example program. Can you modify it so that it replicates the behavior you are seeing?
% cat mirror.f90
program test
use omp_lib
implicit none
integer i,thd,nthd
!$omp parallel do
do i=1,32
call testme(i)
enddo
end program test
subroutine testme (i)
use omp_lib
#ifdef _ACCEL
use accel_lib
#endif
implicit none
integer :: i, ii
integer :: thd
real, dimension(:), allocatable :: arr
!$acc mirror(arr)
thd = omp_get_thread_num()
#ifdef _ACCEL
call acc_set_device_num(thd, ACC_DEVICE_NVIDIA)
#endif
allocate(arr(32))
arr=0
!$acc region
do ii=1,32
arr(ii) = real(i) / (thd+ii)
end do
!$acc end region
!$acc update host (arr)
print *, thd, i, sum(arr)
end subroutine testme
% pgf90 -mp -Mpreprocess -Minfo=mp,accel mirror.f90 -ta=nvidia
test:
7, Parallel region activated
8, Parallel loop activated with static block schedule
10, Parallel region terminated
testme:
23, Generating local(arr(:))
30, Generating compute capability 1.0 binary
Generating compute capability 2.0 binary
31, Loop is parallelizable
Accelerator kernel generated
31, !$acc do parallel, vector(32) ! blockidx%x threadidx%x
CC 1.0 : 11 registers; 48 shared, 40 constant, 0 local memory bytes; 33% occupancy
CC 2.0 : 11 registers; 8 shared, 68 constant, 0 local memory bytes; 16% occupancy
35, Generating !$acc update host(arr(:))
% setenv OMP_NUM_THREADS 4
% a.out
3 25 57.83620
0 1 4.058496
2 17 44.50957
1 9 27.79918
3 26 60.14965
0 2 8.116991
2 18 47.12778
1 10 30.88798
3 27 62.46309
0 3 12.17549
2 19 49.74599
1 11 33.97678
3 28 64.77655
0 4 16.23398
2 20 52.36419
1 12 37.06558
3 29 67.09000
0 5 20.29248
2 21 54.98241
1 13 40.15438
3 30 69.40344
0 6 24.35097
2 22 57.60062
1 14 43.24318
3 31 71.71688
0 7 28.40947
2 23 60.21883
1 15 46.33197
3 32 74.03034
0 8 32.46796
2 24 62.83704
1 16 49.42077