Compilation error for nested device subroutines with constant module data

Hello,

I am getting an error when I try to compile a program that has nested device subroutines where the inner subroutine uses a constant array from a module.
Specifically, I have this problem when I use the -O0 and -Minline inline flags together.
I was having this issue with -O3 too in my project with nvhpc24.1 but that appears to be fixed for nvhpc24.7

Reproducer

A basic example showing the problem is shown below


module myMod
       
    integer, parameter, dimension(4), constant :: arr_const = [1,1,1,1]

    
    integer, parameter, constant :: scaler_const = 1

end module myMod

module gpuKernelsMod
      implicit none
       
    contains
       
       attributes(device) subroutine outterRoutine()
       
        ! uncomment and it will compile
        ! use myMod
        
        call innerRoutine()
          
       end subroutine outterRoutine
    
       attributes(device) subroutine  innerRoutine()
          use myMod
          implicit none
    
          integer :: del(3)
    
          del(1) = arr_const(1)
          
          del(1) = scaler_const
    
       end subroutine innerRoutine
       
end module gpuKernelsMod

! nvfortran -O0 -Minline -Minfo=all  -cuda -gpu=cc80,debug -g -c kernel_mod_repro.f90  -o kernel_mod_repro.o

The compile line I use and the resulting error are shown below

> nvfortran -O0 -Minline -Minfo=all  -cuda -gpu=cc80,debug -g -c kernel_mod_repro.f90  -o kernel_mod_repro.o
outterroutine:
     20, innerroutine inlined, size=3, file kernel_mod_repro.f90 (24)
nvvmCompileProgram error 9: NVVM_ERROR_COMPILATION.
Error: /var/tmp/pbs.181405.pbspl4.nas.nasa.gov/nvaccrXUxZ3cgMG4o.gpu (231, 52): parse '@_mymod_25' defined with type '%common._mymod_25 addrspace(1)*'
NVFORTRAN-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (kernel_mod_repro.f90: 24)
NVFORTRAN/x86-64 Linux 24.7-0: compilation aborted

what does work

  • changing the optimization level to -O2 or -O3
  • adding use myMod to outer routine will cause the inner routine to compile fine
  • use only the scalar_const in myMod
  • commenting out the outter routines will cause the inner routine to compile fine

what does not work

  • removing debug in -gpu options or changing it to fastmath
  • nvfortran -O1 -Minline -Minfo=all -mp -cuda -gpu=cc80,debug -g -c kernel_mod_repro.f90 -o kernel_mod_repro.o

Version and device information

> nvidia-smi
Fri Sep 13 17:31:07 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-80GB          On  | 00000000:C7:00.0 Off |                    0 |
| N/A   33C    P0              61W / 400W |      4MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
> nvfortran -V
nvfortran 24.7-0 64-bit target on x86-64 Linux -tp znver3 
NVIDIA Compilers and Tools
Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Thank you for your support!

Thanks for the report joanib14. I’ve filed a problem report, TPR #36497, and sent it to engineering for investigation.

I’m not sure if this is something that can be fixed, it may need the constant propagation that occurs at -O2 so the reference to the module isn’t needed, but I’ll let engineering evaluate.

1 Like