Nvc++ -cuda fails to link code when using device cuRAND functions

Hi,

We are interesting in building our code with single-pass CUDA compilation using nvc++ -cuda. We have replaced all usage of __CUDA_ARCH__ with the portable NV_IF_TARGET macros.

Using NVC++ 23.9, the code successfully builds with nvc++ -cuda, but we get device linker errors for device-side cuRAND symbols:

[ 66%] Linking CUDA executable 3d/Test_Amr_Advection_AmrCore_3d
nvlink error   : Multiple definition of 'precalc_xorwow_matrix' in 'CMakeFiles/Test_Amr_Advection_AmrCore_3d.dir/Source/AdvancePhiAtLevel.cpp.o', first defined in 'CMakeFiles/Test_Amr_Advection_AmrCore_3d.dir/Source/AdvancePhiAllLevels.cpp.o'
nvlink error   : Multiple definition of 'precalc_xorwow_offset_matrix' in 'CMakeFiles/Test_Amr_Advection_AmrCore_3d.dir/Source/AdvancePhiAtLevel.cpp.o', first defined in 'CMakeFiles/Test_Amr_Advection_AmrCore_3d.dir/Source/AdvancePhiAllLevels.cpp.o'
nvlink error   : Multiple definition of 'mrg32k3aM1' in 'CMakeFiles/Test_Amr_Advection_AmrCore_3d.dir/Source/AdvancePhiAtLevel.cpp.o', first defined in 'CMakeFiles/Test_Amr_Advection_AmrCore_3d.dir/Source/AdvancePhiAllLevels.cpp.o'
nvlink error   : Multiple definition of 'mrg32k3aM2' in 'CMakeFiles/Test_Amr_Advection_AmrCore_3d.dir/Source/AdvancePhiAtLevel.cpp.o', first defined in 'CMakeFiles/Test_Amr_Advection_AmrCore_3d.dir/Source/AdvancePhiAllLevels.cpp.o'
nvlink error   : Multiple definition of 'mrg32k3aM1SubSeq' in 'CMakeFiles/Test_Amr_Advection_AmrCore_3d.dir/Source/AdvancePhiAtLevel.cpp.o', first defined in 'CMakeFiles/Test_Amr_Advection_AmrCore_3d.dir/Source/AdvancePhiAllLevels.cpp.o'
nvlink error   : Multiple definition of 'mrg32k3aM2SubSeq' in 'CMakeFiles/Test_Amr_Advection_AmrCore_3d.dir/Source/AdvancePhiAtLevel.cpp.o', first defined in 'CMakeFiles/Test_Amr_Advection_AmrCore_3d.dir/Source/AdvancePhiAllLevels.cpp.o'
nvlink error   : Multiple definition of 'mrg32k3aM1Seq' in 'CMakeFiles/Test_Amr_Advection_AmrCore_3d.dir/Source/AdvancePhiAtLevel.cpp.o', first defined in 'CMakeFiles/Test_Amr_Advection_AmrCore_3d.dir/Source/AdvancePhiAllLevels.cpp.o'
nvlink error   : Multiple definition of 'mrg32k3aM2Seq' in 'CMakeFiles/Test_Amr_Advection_AmrCore_3d.dir/Source/AdvancePhiAtLevel.cpp.o', first defined in 'CMakeFiles/Test_Amr_Advection_AmrCore_3d.dir/Source/AdvancePhiAllLevels.cpp.o'
nvlink error   : Multiple definition of '__cr_lgamma_table' in 'CMakeFiles/Test_Amr_Advection_AmrCore_3d.dir/Source/AdvancePhiAtLevel.cpp.o', first defined in 'CMakeFiles/Test_Amr_Advection_AmrCore_3d.dir/Source/AdvancePhiAllLevels.cpp.o'
nvlink fatal   : merge_elf failed
pgacclnk: child process exit status 2: /mnt/ufs18/home-208/wibkingb/spack/opt/spack/linux-centos7-zen/gcc-6.4.0/nvhpc-23.9-xysfmihvcig2jao55qhodg36ybrsxkde/Linux_x86_64/23.9/compilers/bin/tools/nvdd
make[2]: *** [Tests/Amr/Advection_AmrCore/3d/Test_Amr_Advection_AmrCore_3d] Error 2
make[1]: *** [Tests/Amr/Advection_AmrCore/CMakeFiles/Test_Amr_Advection_AmrCore_3d.dir/all] Error 2

This can be reproduced following the build recipe here: replace AMREX_DEVICE_COMPILE with AMREX_IF_ON_DEVICE and AMREX_IF_ON_HOST by BenWibking · Pull Request #3591 · AMReX-Codes/amrex · GitHub

Is this a bug in NVC++ or cuRAND, or is some change to how we are linking our code needed with NVC++ CUDA?

Thanks,
Ben

Hi Ben,

I just tried compiling a simple CUDA code with cuRand device calls built with nvc++ but compiled, linked, and ran without issue. So while I can’t say for sure it’s not a compiler issue, it seems more to do with how it’s being used in AMREX.

I’ve only looked at AMREX a few times, and that was years ago, so if you could do me a favor and document the steps needed to reproduce the error, then I can dig into it. From the initial git clone, the git checkout, environment settings, and then the cmake command you use. Plus anything else that’s relevant.

Of course if you can provide small reproducer, that would be even better. From what I remember, AMREX can be difficult to debug given all the templating.

-Mat

Hi Mat,

Here is a reproducer. The issue seems to be curand_kernel.h does not work with C++17.

1 Like

The reason why C++17 does not work seems to be that C++17 inline variable does not actually work for device variables. For example, there is this in /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/include/curand_mrg32k3a.h

#if defined(__CUDACC_RDC__) && (__cplusplus >= 201703L) && defined(__cpp_inline_variables)
#define CURAND_MRG32K3A_MATRICES_DEVICE_QUALIFIERS inline __device__
#else
#define CURAND_MRG32K3A_MATRICES_DEVICE_QUALIFIERS static __device__
#endif
1 Like

Thanks!

I see the issue. With nvcc, these global arrays are getting decorated with “.weak” but with nvc++ they are not making them hard references and thus the multiple definition error.

I’ve added a problem report, TPR #34526, and sent it to engineering for investigation.

-Mat

1 Like

I’ve tried this again with NVHPC 24.1, and it still produces the same error. Is a fix planned for this issue?

-Ben

Just checked TPR #34526 that Mat filed and it looks like engineering has identified the issue and have been able to implement a fix for it - but it was after the cutoff for integration into NVHPC 24.1. I believe the issue you’re seeing will likely be resolved with the release of NVHPC 24.3. Thank you for your patience!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.