Hello,
I am getting an error when I try to compile a program that has nested device subroutines where the inner subroutine uses a constant array from a module.
Specifically, I have this problem when I use the -O0
and -Minline
inline flags together.
I was having this issue with -O3
too in my project with nvhpc24.1 but that appears to be fixed for nvhpc24.7
Reproducer
A basic example showing the problem is shown below
module myMod
integer, parameter, dimension(4), constant :: arr_const = [1,1,1,1]
integer, parameter, constant :: scaler_const = 1
end module myMod
module gpuKernelsMod
implicit none
contains
attributes(device) subroutine outterRoutine()
! uncomment and it will compile
! use myMod
call innerRoutine()
end subroutine outterRoutine
attributes(device) subroutine innerRoutine()
use myMod
implicit none
integer :: del(3)
del(1) = arr_const(1)
del(1) = scaler_const
end subroutine innerRoutine
end module gpuKernelsMod
! nvfortran -O0 -Minline -Minfo=all -cuda -gpu=cc80,debug -g -c kernel_mod_repro.f90 -o kernel_mod_repro.o
The compile line I use and the resulting error are shown below
> nvfortran -O0 -Minline -Minfo=all -cuda -gpu=cc80,debug -g -c kernel_mod_repro.f90 -o kernel_mod_repro.o
outterroutine:
20, innerroutine inlined, size=3, file kernel_mod_repro.f90 (24)
nvvmCompileProgram error 9: NVVM_ERROR_COMPILATION.
Error: /var/tmp/pbs.181405.pbspl4.nas.nasa.gov/nvaccrXUxZ3cgMG4o.gpu (231, 52): parse '@_mymod_25' defined with type '%common._mymod_25 addrspace(1)*'
NVFORTRAN-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (kernel_mod_repro.f90: 24)
NVFORTRAN/x86-64 Linux 24.7-0: compilation aborted
what does work
- changing the optimization level to
-O2
or-O3
- adding
use myMod
to outer routine will cause the inner routine to compile fine - use only the
scalar_const
inmyMod
- commenting out the outter routines will cause the inner routine to compile fine
what does not work
- removing debug in
-gpu
options or changing it to fastmath - nvfortran -O1 -Minline -Minfo=all -mp -cuda -gpu=cc80,debug -g -c kernel_mod_repro.f90 -o kernel_mod_repro.o
Version and device information
> nvidia-smi
Fri Sep 13 17:31:07 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-80GB On | 00000000:C7:00.0 Off | 0 |
| N/A 33C P0 61W / 400W | 4MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
> nvfortran -V
nvfortran 24.7-0 64-bit target on x86-64 Linux -tp znver3
NVIDIA Compilers and Tools
Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Thank you for your support!