Bug: NVHPC 25.X present table errors with fortran do concurrent and kind-of nested type-bound procedures

edoy · May 14, 2025, 3:27am

Hi,

I’ve run into a very specific problem when using do concurrent with a sort-of nested type-bound procedures. The error in the below MRE arises when:

NVHPC 25.1 or 25.3 are used
-O1 or higher is enabled
both -mp=gpu and stdpar=gpu are added to the compile flags

module testm

    implicit none

    type:: base
    contains
        procedure:: an_elemental_function
        procedure:: a_2d_subroutine
    end type base

contains

    real elemental function an_elemental_function(this, input)
        class(base), intent(in):: this
        real, intent(in):: input
        an_elemental_function = 2.*input
    end function an_elemental_function

    subroutine a_2d_subroutine(this, input)
        class(base), intent(in):: this
        real, intent(inout):: input(:, :)
        integer:: i, j, s(2)
        s = shape(input)
        do concurrent(i=1:s(1), j=1:s(2))
            input(i,j) = an_elemental_function(this, input(i, j))
        enddo
    end subroutine a_2d_subroutine

end module testm

program test

    use testm

    implicit none
    type(base):: t
    real:: a(4, 4)

    a(:, :) = 2.

    call t%a_2d_subroutine(a)

    write(*, *) sum(a) == 64.0

end program test

Note the do concurrent in a_2d_subroutine, which calls an_elemental_function - both of which are methods of base. I get the error:

Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 8.9, threadid=1
Hint: specify 0x800 bit in NV_ACC_DEBUG for verbose info.
host:0x4091e0 device:0x75feefafa200 size:80 presentcount:1+0 line:24 name:descriptor
host:0x4095c0 device:0x75feefafa000 size:64 presentcount:1+0 line:24 name:input(:,:)
host:0x409600 device:(nil) size:0 presentcount:1+0 line:24 name:this
allocated block device:0x75feefafa000 size:512 thread:1
allocated block device:0x75feefafa200 size:512 thread:1

Present table errors:
.O0001(:) lives at 0x4091e0 size 1180 partially present in
host:0x4091e0 device:0x75feefafa200 size:80 presentcount:1+0 line:24 name:descriptor file:/home/edwardy/test-simple.f90
host:0x4095c0 device:0x75feefafa000 size:64 presentcount:1+0 line:24 name:input(:,:) file:/home/edwardy/test-simple.f90
host:0x409600 device:(nil) size:0 presentcount:1+0 line:24 name:this file:/home/edwardy/test-simple.f90
FATAL ERROR: variable in data clause is partially present on the device: name=.O0001(:)
 file:/home/edwardy/test-simple.f90 a_2d_subroutine line:24

Note that if I replace input(i,j) = an_elemental_function(this, input(i, j)) with a class method call input(i,j) = this%an_elemental_function(input(i, j)), then it works fine.

Full compilation flag: nvfortran test.f90 -mp=gpu -stdpar=gpu -gpu=mem:separate -O1.

The example works as normal for nvhpc 24.9, regardless of flag combination.

MatColgrove · May 14, 2025, 4:13pm

Since the code is accessing host stack variables on the device, it requires full Unified Memory, i.e. “-gpu=mem:unified”.

Does your system and device support HMM, which is needed for full Unified Memory?

For example on a Grace-Hopper system:

% nvfortran -stdpar=gpu -gpu=mem:separate test.F90 ; a.out
Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 9.0, threadid=1
Hint: specify 0x800 bit in NV_ACC_DEBUG for verbose info.
host:0x417220 device:0x400419efa200 size:80 presentcount:1+0 line:24 name:descriptor
host:0x4175c0 device:0x400419efa000 size:64 presentcount:1+0 line:24 name:input(:,:)
host:0x417600 device:(nil) size:0 presentcount:1+0 line:24 name:this
allocated block device:0x400419efa000 size:512 thread:1
allocated block device:0x400419efa200 size:512 thread:1

Present table errors:
.O0001(:) lives at 0x417220 size 1180 partially present in
host:0x417220 device:0x400419efa200 size:80 presentcount:1+0 line:24 name:descriptor file:/home/mcolgrove/tmp/test.F90
host:0x4175c0 device:0x400419efa000 size:64 presentcount:1+0 line:24 name:input(:,:) file:/home/mcolgrove/tmp/test.F90
host:0x417600 device:(nil) size:0 presentcount:1+0 line:24 name:this file:/home/mcolgrove/tmp/test.F90
FATAL ERROR: variable in data clause is partially present on the device: name=.O0001(:)
 file:/home/mcolgrove/tmp/test.F90 a_2d_subroutine line:24

% nvfortran -stdpar=gpu -gpu=mem:unified test.F90 ; a.out
  T

edoy · May 15, 2025, 1:14am

Hi Matt,

No we don’t have HMM enabled systems yet.

MatColgrove · May 15, 2025, 3:21pm

Ok, though if you’re able to enable HMM you’ll be able to use a wider variety of code on the GPU. It does require a newer version of Linux and CUDA drivers as well as newer GPU architectures. Full details are in the article I linked above.

Performance wise, full UM over PCIe isn’t the best but should be functional. It’s much better using NVLink on the Grace-Hopper systems if you have access.

edoy · May 15, 2025, 11:50pm

Thanks Mat,

I should clarify that we’re not intending to use unified memory for the time being since we don’t have any HMM systems (and probably won’t for awhile). We’re happy to manually manage memory with OpenMP directives for the time being.

I’m more concerned because the error mentioned seems to be new. We hope that we don’t get blocked from using newer nvfortran versions.

MatColgrove · May 16, 2025, 4:01pm

Sincere apologies! The was a flood of UF posts the last few days, so I was moving too fast and missed that this is a regression.

I filed a problem report, TPR #37406, and will have engineering investigate.

Note that I’m now thinking that it might be a device inlining issue as adding “-Minline”, which is done by the front-end compiler, works around the problem.

Note for performance, how you have it now, the compiler needs to implicitly copy the data each time it encounters the DC loop. UM should help here, or you might consider adding OpenACC or OpenMP data regions to hoist the data movement earlier in the program.

-Mat

Topic		Replies	Views
NVHPC 26.1 fort2 TERMINATED by signal 11 nvc, nvc++ and nvfortran nvbugs	6	55	February 26, 2026
Accelerating Fortran DO CONCURRENT with GPUs and the NVIDIA HPC SDK Technical Blog	28	2745	February 25, 2025
NVFORTRAN-S-0038-Symbol problem nvc, nvc++ and nvfortran	5	1955	August 20, 2022
Nvfortran cannot compile a pure-procedure pointer in a do concurrent loop nvc, nvc++ and nvfortran	3	839	March 25, 2022
OpenMP 4.5 and Fortran Type bound procedure nvc, nvc++ and nvfortran	1	447	July 10, 2023
Does nvfortran -stdpar=gpu support two GPUs with NVLink? nvc, nvc++ and nvfortran	12	206	April 2, 2025
Performance Issue / End of Program Dump using Stdpar nvc, nvc++ and nvfortran gpu-computing	3	105	October 10, 2024
Nvfortran compilation error for stdpar nvc, nvc++ and nvfortran	6	147	January 27, 2025
Derived types not working correctly with "!$acc parallel" and Unified Memory nvc, nvc++ and nvfortran	4	551	November 10, 2022
NV 23.11 not in-lining with -Minline (works with 23.5) nvc, nvc++ and nvfortran	10	584	February 1, 2024

Bug: NVHPC 25.X present table errors with fortran do concurrent and kind-of nested type-bound procedures

Related topics