Hi everyone,
We have a device code linker issue that happens in a rather big test application GitHub - ecmwf-ifs/dwarf-p-cloudsc: Standalone mini-app of the ECMWF cloud microphysics parameterization (actually it is a mini-app related to microphysics from ECMWF climate models).
We can’t find a test case smaller than this, but we hope it is easy enough to build and analyze this code (the set of scripts to build and run it is attached).
cloudsc-build.public.tar.gz (28.6 KB)
The problem is: when we choose to build the application by linking the internal library code dynamically, we face this error:
[100%] Linking Fortran executable ../../../bin/dwarf-cloudsc-gpu-scc
nvlink error : Undefined reference to '_yomcst_21' in 'CMakeFiles/dwarf-cloudsc-gpu-scc.dir/cloudsc_gpu_scc_mod.F90.o'
nvlink error : Undefined reference to '_yoethf_21' in 'CMakeFiles/dwarf-cloudsc-gpu-scc.dir/cloudsc_gpu_scc_mod.F90.o'
pgacclnk: child process exit status 2: /gpfs/apps/MN5/ACC/NVIDIA-HPC-SDK/25.7/Linux_x86_64/25.7/compilers/bin/tools/nvdd
This disappears if we link the code statically. No such error for OpenMP version of the same code, as well as for CUDA version of it. I wonder if this can be reproduced and addressed in some way.
To build things, one should create dnb-xxx.yaml similar to dnb-mn5-acc.yaml that is given there as an example. At runtime, the wrapper script is assumed: similar to scripts/generic/mn5-acc/runner-script.sh – it handles various affinity aspects. To execute, it is handy to use psubmit.sh script system that has generic interface for SLURM, MPI and does convenient pre-post-processing for the data required/produced by the executable.
The linker error will appear if one changes DNB_CLOUDSC_WITH_DYNAMIC_LINK=FALSE to DNB_CLOUDSC_WITH_DYNAMIC_LINK=TRUE in the overrides.yaml
The issue seems to be a long standing one since we have a record of the old forum topic: https://www.pgroup.com/userforum/viewtopic.php?t=7296
This case also makes us suspicious that device code shared library linking has more issues particularly in the OpenACC case, because in our bigger code that can’t be shared due to its licensing limitations we have memory corruption issues when the library containing any OpenACC code (even the CPU version of the offloading, and even not containing actual OpenACC kernels, just being build with OpenCC options; is also fine with OpenMP target offloading build options, or when linked statically).