Hi,
I’ve run into a very specific problem when using do concurrent with a sort-of nested type-bound procedures. The error in the below MRE arises when:
- NVHPC 25.1 or 25.3 are used
-O1or higher is enabled- both
-mp=gpuandstdpar=gpuare added to the compile flags
module testm
implicit none
type:: base
contains
procedure:: an_elemental_function
procedure:: a_2d_subroutine
end type base
contains
real elemental function an_elemental_function(this, input)
class(base), intent(in):: this
real, intent(in):: input
an_elemental_function = 2.*input
end function an_elemental_function
subroutine a_2d_subroutine(this, input)
class(base), intent(in):: this
real, intent(inout):: input(:, :)
integer:: i, j, s(2)
s = shape(input)
do concurrent(i=1:s(1), j=1:s(2))
input(i,j) = an_elemental_function(this, input(i, j))
enddo
end subroutine a_2d_subroutine
end module testm
program test
use testm
implicit none
type(base):: t
real:: a(4, 4)
a(:, :) = 2.
call t%a_2d_subroutine(a)
write(*, *) sum(a) == 64.0
end program test
Note the do concurrent in a_2d_subroutine, which calls an_elemental_function - both of which are methods of base. I get the error:
Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 8.9, threadid=1
Hint: specify 0x800 bit in NV_ACC_DEBUG for verbose info.
host:0x4091e0 device:0x75feefafa200 size:80 presentcount:1+0 line:24 name:descriptor
host:0x4095c0 device:0x75feefafa000 size:64 presentcount:1+0 line:24 name:input(:,:)
host:0x409600 device:(nil) size:0 presentcount:1+0 line:24 name:this
allocated block device:0x75feefafa000 size:512 thread:1
allocated block device:0x75feefafa200 size:512 thread:1
Present table errors:
.O0001(:) lives at 0x4091e0 size 1180 partially present in
host:0x4091e0 device:0x75feefafa200 size:80 presentcount:1+0 line:24 name:descriptor file:/home/edwardy/test-simple.f90
host:0x4095c0 device:0x75feefafa000 size:64 presentcount:1+0 line:24 name:input(:,:) file:/home/edwardy/test-simple.f90
host:0x409600 device:(nil) size:0 presentcount:1+0 line:24 name:this file:/home/edwardy/test-simple.f90
FATAL ERROR: variable in data clause is partially present on the device: name=.O0001(:)
file:/home/edwardy/test-simple.f90 a_2d_subroutine line:24
Note that if I replace input(i,j) = an_elemental_function(this, input(i, j)) with a class method call input(i,j) = this%an_elemental_function(input(i, j)), then it works fine.
Full compilation flag: nvfortran test.f90 -mp=gpu -stdpar=gpu -gpu=mem:separate -O1.
The example works as normal for nvhpc 24.9, regardless of flag combination.