Unexpected error when accessing an optional scalar argument inside an OpenACC kernel with `default(present)`

Dear all,

Please see this Fortran code and output.

 program scalar_array_multiplication
   implicit none
   integer, parameter :: n = 10
   integer :: i
   real :: array(n)
   do i = 1, n
     array(i) = i
   end do
   print *, 'Original array:'
   print *, array
   call multiply_array(array, 2.0)
   print *, 'Updated array after multiplication:'
   print *, array
   contains
     subroutine multiply_array(arr, scalar)
       implicit none
       real, intent(inout) :: arr(:)
       real, intent(in), optional :: scalar
       integer :: i
       !$acc enter data copyin(arr)
       !$acc parallel loop default(present)
       do i=1,size(arr)
          arr(i) = arr(i)*scalar
       end do
       !$acc exit data copyout(arr)
     end subroutine multiply_array
     !!
     !! safe way of doing this
     !!
     !subroutine multiply_array(arr, scalar)
     !  implicit none
     !  real, intent(inout) :: arr(:)
     !  real, intent(in), optional :: scalar
     !  real :: factor
     !  integer :: i
     !  if (present(scalar)) then
     !    factor = scalar
     !  else
     !    factor = 1.0
     !  end if
     !  !$acc enter data copyin(arr)
     !  !$acc parallel loop default(present)
     !  do i = 1, size(arr)
     !     arr(i) = arr(i) * factor
     !  end do
     !  !$acc exit data copyout(arr)
     !end subroutine multiply_array
 end program scalar_array_multiplication
$ nvfortran --version && nvfortran -acc test.f90 -o test && ./test

nvfortran 24.3-0 64-bit target on x86-64 Linux -tp cascadelake
NVIDIA Compilers and Tools
Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
 Original array:
    1.000000        2.000000        3.000000        4.000000
    5.000000        6.000000        7.000000        8.000000
    9.000000        10.00000
hostptr=0x4030d8,eltsize=4,name=scalar,flags=0x20000200=present+implicit,async=-1,threadid=1
Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 8.6, threadid=1
Hint: specify 0x800 bit in NV_ACC_DEBUG for verbose info.
host:0x407550 device:0x781b192fa000 size:40 presentcount:0+1 line:20 name:arr(:)
allocated block device:0x781b192fa000 size:512 thread:1
FATAL ERROR: data in PRESENT clause was not found on device 1: name=scalar host:0x4030d8
 file:/home/pedro/Desktop/test.f90 multiply_array line:21

(The workaround, and generally the safe way of working with optional arguments, is in the commented subroutine)

I’m not sure if this is an expected behavior of default(present), for firstprivate scalar arguments that are passed as an optional argument in a Fortran subroutine where present(scalar) is .true.. From the OpenACC Specification, I think there may be undefined behavior if present(scalar) is .false.. See section 2.17 here: https://www.openacc.org/sites/default/files/inline-images/Specification/OpenACC-3.2-final.pdf#page=86.62

Hi Pedro,

Optional arguments are a special case. Note that section 2.17 is referring to the Fortran “present” intrinsic, not to be confused with OpenACC’s “present”. At line 3088, the standard notes that using optionals as firstprivate, which would be the default for scalars, results in undefined behavior. Hence optional scalars are treated as pointers and therefor need to be copied in. Since “scalar” is not copied, this is why the not present error is encountered.

Here’s how I’d recommend writing this:

% cat test.F90
 program scalar_array_multiplication
   implicit none
   integer, parameter :: n = 10
   integer :: i
   real :: array(n)
   do i = 1, n
     array(i) = i
   end do
   print *, 'Original array:'
   print *, array
! call once with scalar present
   call multiply_array(array, 2.0)
! and again without it being present
   call multiply_array(array)
   print *, 'Updated array after multiplication:'
   print *, array
   contains
     subroutine multiply_array(arr, scalar)
       implicit none
       real, intent(inout) :: arr(:)
       real, intent(in), optional :: scalar
       integer :: i
       !$acc enter data copyin(arr,scalar)
       !$acc parallel loop default(present)
       do i=1,size(arr)
          if (present(scalar)) then
             arr(i) = arr(i)*scalar
          else
             arr(i) = arr(i)*1.5
          endif
       end do
       !$acc exit data copyout(arr) delete(scalar)
     end subroutine multiply_array
 end program scalar_array_multiplication
% nvfortran -acc test.F90 -Minfo=accel; a.out
multiply_array:
     23, Generating enter data copyin(scalar,arr(:))
     24, Generating NVIDIA GPU code
         25, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
     24, Generating default present(arr(:),scalar)
     32, Generating exit data delete(scalar)
         Generating exit data copyout(arr(:))
 Original array:
    1.000000        2.000000        3.000000        4.000000
    5.000000        6.000000        7.000000        8.000000
    9.000000        10.00000
 Updated array after multiplication:
    3.000000        6.000000        9.000000        12.00000
    15.00000        18.00000        21.00000        24.00000
    27.00000        30.00000

Note that if “scalar” is not present, it basically becomes a NULL pointer and the copyin clause handles it appropriately.

Hope this helps,
Mat

Hi Mat,

Thanks! I agree that your approach is the preferred one; my example is not good practice.

My point was precisely related to what is stated in 2.17 in the standard:

The appearance of a Fortran optional argument arg in the following situations may result in undefined behavior if PRESENT(arg) is .false. when the associated construct is executed: as a var in private, firstprivate, and reduction clauses.

In my example, present(scalar) is .true., because the optional argument is passed in the caller program. Hence, this should not result in unspecified behavior, and I’d expect scalar to be treated as a scalar with the default firstprivate attribute.

Is see your point, but implementation wise it seems problematic. Granted I’m not a compiler engineer myself, but to achieve this, I believe the compiler would need to add runtime checks adding extra overhead since it can’t be known until runtime if scalar is present or not. Better to presume that scalar could be .false. and generate the appropriate code at compile time.

1 Like