The offloading using standard language is a major leap forward when writing portable programs, many codes have lifetime measured in decades.
While do concurrent works nice for loops where indexed variables are addressed element by element it would be beneficial to be able to use vector syntax like A=B+exp(C) where A,B & C are matrices. Or expressions using where, like where (Z>0) A=sqrt(Z).
1 Like
While I don’t have any insights into the future direction of the Fortran standard, nvfortran can auto-parallelize array syntax within a do concurrent loop. So the way to do this now is something like the following. Though, no guarantee’s other compilers would follow suit and may just offload this sequentially.
% cat test.f90
program foo
integer i
real, dimension(:), allocatable :: A, B, C
allocate(A(1024),B(1024),C(1024))
B=1
C=2
do concurrent(i=1:1)
A=B+exp(C)
end do
print *, A(1:5)
deallocate(A,B,C)
end program foo
% nvfortran test.f90 -stdpar -Minfo; a.out
foo:
8, Memory set idiom, loop replaced by call to __c_mset4
9, Memory set idiom, loop replaced by call to __c_mset4
11, Generating NVIDIA GPU code
10, Loop parallelized across CUDA thread blocks, CUDA threads(128) collapse(2) ! blockidx%x threadidx%x
11, ! blockidx%x threadidx%x auto-collapsed
8.389056 8.389056 8.389056 8.389056
8.389056
This is roughly equivalent to using OpenACC’s “kernels” directive with managed memory.
% cat test_acc.f90
program foo
integer i
real, dimension(:), allocatable :: A, B, C
allocate(A(1024),B(1024),C(1024))
B=1
C=2
!$acc kernels
A=B+exp(C)
!$acc end kernels
print *, A(1:5)
deallocate(A,B,C)
end program foo
% nvfortran test_acc.f90 -acc -Minfo -gpu=managed; a.out
foo:
8, Memory set idiom, loop replaced by call to __c_mset4
9, Memory set idiom, loop replaced by call to __c_mset4
10, Generating implicit copyout(a(1:1024)) [if not already present]
Generating implicit copyin(c(1:1024),b(1:1024)) [if not already present]
11, Loop is parallelizable
Generating NVIDIA GPU code
11, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
8.389056 8.389056 8.389056 8.389056
8.389056
1 Like