Compilation error for nested device subroutines with constant module data

joanib14 · September 14, 2024, 1:18am

Hello,

I am getting an error when I try to compile a program that has nested device subroutines where the inner subroutine uses a constant array from a module.
Specifically, I have this problem when I use the -O0 and -Minline inline flags together.
I was having this issue with -O3 too in my project with nvhpc24.1 but that appears to be fixed for nvhpc24.7

Reproducer

A basic example showing the problem is shown below


module myMod
       
    integer, parameter, dimension(4), constant :: arr_const = [1,1,1,1]

    
    integer, parameter, constant :: scaler_const = 1

end module myMod

module gpuKernelsMod
      implicit none
       
    contains
       
       attributes(device) subroutine outterRoutine()
       
        ! uncomment and it will compile
        ! use myMod
        
        call innerRoutine()
          
       end subroutine outterRoutine
    
       attributes(device) subroutine  innerRoutine()
          use myMod
          implicit none
    
          integer :: del(3)
    
          del(1) = arr_const(1)
          
          del(1) = scaler_const
    
       end subroutine innerRoutine
       
end module gpuKernelsMod

! nvfortran -O0 -Minline -Minfo=all  -cuda -gpu=cc80,debug -g -c kernel_mod_repro.f90  -o kernel_mod_repro.o

The compile line I use and the resulting error are shown below

> nvfortran -O0 -Minline -Minfo=all  -cuda -gpu=cc80,debug -g -c kernel_mod_repro.f90  -o kernel_mod_repro.o
outterroutine:
     20, innerroutine inlined, size=3, file kernel_mod_repro.f90 (24)
nvvmCompileProgram error 9: NVVM_ERROR_COMPILATION.
Error: /var/tmp/pbs.181405.pbspl4.nas.nasa.gov/nvaccrXUxZ3cgMG4o.gpu (231, 52): parse '@_mymod_25' defined with type '%common._mymod_25 addrspace(1)*'
NVFORTRAN-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (kernel_mod_repro.f90: 24)
NVFORTRAN/x86-64 Linux 24.7-0: compilation aborted

what does work

changing the optimization level to -O2 or -O3
adding use myMod to outer routine will cause the inner routine to compile fine
use only the scalar_const in myMod
commenting out the outter routines will cause the inner routine to compile fine

what does not work

removing debug in -gpu options or changing it to fastmath
nvfortran -O1 -Minline -Minfo=all -mp -cuda -gpu=cc80,debug -g -c kernel_mod_repro.f90 -o kernel_mod_repro.o

Version and device information

> nvidia-smi
Fri Sep 13 17:31:07 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-80GB          On  | 00000000:C7:00.0 Off |                    0 |
| N/A   33C    P0              61W / 400W |      4MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

> nvfortran -V
nvfortran 24.7-0 64-bit target on x86-64 Linux -tp znver3 
NVIDIA Compilers and Tools
Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Thank you for your support!

MatColgrove · September 16, 2024, 5:37pm

Thanks for the report joanib14. I’ve filed a problem report, TPR #36497, and sent it to engineering for investigation.

I’m not sure if this is something that can be fixed, it may need the constant propagation that occurs at -O2 so the reference to the module isn’t needed, but I’ll let engineering evaluate.

Topic		Replies	Views
Internal compiler error (nvfortran) with implied-shape arrays nvc, nvc++ and nvfortran	2	397	June 3, 2024
NV 23.11 not in-lining with -Minline (works with 23.5) nvc, nvc++ and nvfortran	10	444	February 1, 2024
Internal compiler error with nvfortran 22.9 nvc, nvc++ and nvfortran	4	540	November 16, 2022
Nvfortran crashes when compiling a module/submodule using ieee_arithmetic nvc, nvc++ and nvfortran	2	490	April 19, 2021
Declaration of constant character array fails nvc, nvc++ and nvfortran	3	810	April 8, 2022
Internal compiler error in OOP Fortran code nvc, nvc++ and nvfortran	2	723	April 4, 2023
Constant expression in bind(C) name nvc, nvc++ and nvfortran	6	973	May 25, 2022
Runtime error with nvfortran 20.7 nvc, nvc++ and nvfortran	7	812	March 24, 2022
Nvfortran: Passing shared arrays of variable size to device subroutine causes memory error nvc, nvc++ and nvfortran cuda	7	52	August 28, 2024
Bug in nvfortran for mod(i,0)? nvc, nvc++ and nvfortran	3	606	October 13, 2022

Compilation error for nested device subroutines with constant module data

Reproducer

what does work

what does not work

Version and device information

Related topics