NVSHMEM Installation undefined reference to `__sync_synchronize'

Hello Community,

I am currently trying to build NVSHMEM from source and have run into an error I am struggling to debug.
The system I am using has A100s, this is how I set it up:

tar -xvf nvshmem_src_2.11.0-5
cd nvshmem_src_2.11.0-5
export CUDA_HOME=/usr/local/software/cuda/11.4
export GDRCOPY_HOME=/usr/local/software/spack/spack-rhel8-20210927/opt/spack/linux-centos8-zen2/gcc-9.4.0/gdrcopy-2.2-e4igtfpykmoeel576nnlyrwwl6udeu4r
export NVSHMEM_USE_GDRCOPY=1
export NVSHMEM_MPI_SUPPORT=1
export NVSHMEM_SHMEM_SUPPORT=0
export MPI_HOME=/usr/local/software/spack/spack-rhel8-20210927/opt/spack/linux-centos8-zen2/gcc-9.4.0/openmpi-4.1.1-epagguvqfrwuokz2ftiyzbxibae5afnn
export NVSHMEM_UCX_SUPPORT=1
export UCX_HOME=/usr/local/software/spack/spack-rhel8-20210927/opt/spack/linux-centos8-zen2/gcc-9.4.0/ucx-1.11.1-lktqyl4gjbz36wqifl2e2wonn65xtrsr
export NVSHMEM_USE_NCCL=1
export NCCL_HOME=/usr/local/software/spack/spack-rhel8-20210927/opt/spack/linux-centos8-zen2/gcc-9.4.0/nvhpc-22.3-ywtqynx7blit6h47igf4xbb5uy6q4hn6/Linux_x86_64/22.3/comm_libs/nccl
module load pmix/3.2.1/gcc-9.4.0-xmxop6c 
export PMIX_HOME=/usr/local/software/spack/spack-rhel8-20210927/opt/spack/linux-centos8-zen2/gcc-9.4.0/pmix-3.2.1-xmxop6ciq3xppxho4ek4ud7hkpgqjfgz/
export NVSHMEM_PMIX_SUPPORT=1
export NVSHMEM_PREFIX=/home/co-morg1/rds/hpc-work/nvshmem
sh: ./scripts/test_HAVE_IBV_ACCESS_RELAXED_ORDERING.sh: No such file or directory
/usr/local/software/cuda/11.4/bin/nvcc -t 4  -O3 -Xcompiler -fPIC -ccbin g++ -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -I/usr/local/software/spack/spack-rhel8-20210927/opt/spack/linux-centos8-zen2/gcc-9.4.0/openmpi-4.1.1-epagguvqfrwuokz2ftiyzbxibae5afnn/include -I/usr/local/software/spack/spack-rhel8-20210927/opt/spack/linux-centos8-zen2/gcc-9.4.0/ucx-1.11.1-lktqyl4gjbz36wqifl2e2wonn65xtrsr/include -I/usr/local/software/spack/spack-rhel8-20210927/opt/spack/linux-centos8-zen2/gcc-9.4.0/gdrcopy-2.2-e4igtfpykmoeel576nnlyrwwl6udeu4r/include -I/usr/local/software/spack/spack-rhel8-20210927/opt/spack/linux-centos8-zen2/gcc-9.4.0/nvhpc-22.3-ywtqynx7blit6h47igf4xbb5uy6q4hn6/Linux_x86_64/22.3/comm_libs/nccl/include -Isrc/include -I/rds/user/co-morg1/hpc-work/nvshmem_src_2.11.0-5/build/include src/bin/nvshmem-info.cpp -o /rds/user/co-morg1/hpc-work/nvshmem_src_2.11.0-5/build/bin/nvshmem-info -L/usr/local/software/cuda/11.4/lib64 -lcudart_static -L/usr/local/software/cuda/11.4/lib64/stubs -Xlinker --enable-new-dtags -Xlinker -rpath='$ORIGIN' -L/rds/user/co-morg1/hpc-work/nvshmem_src_2.11.0-5/build/lib -lnvshmem
/rds/user/co-morg1/hpc-work/nvshmem_src_2.11.0-5/build/lib/libnvshmem.a(env_vars.o): In function `nvtxInitOnce_v3':
/usr/local/software/cuda/11.4/include/nvtx3/nvtxDetail/nvtxInit.h:312: undefined reference to `__sync_synchronize'
/usr/local/software/cuda/11.4/include/nvtx3/nvtxDetail/nvtxInit.h:312: undefined reference to `__sync_val_compare_and_swap_4'
/usr/local/software/cuda/11.4/include/nvtx3/nvtxDetail/nvtxInit.h:317: undefined reference to `__sync_synchronize'
/usr/local/software/cuda/11.4/include/nvtx3/nvtxDetail/nvtxInit.h:330: undefined reference to `__sync_synchronize'
/usr/local/software/cuda/11.4/include/nvtx3/nvtxDetail/nvtxInit.h:330: undefined reference to `__sync_lock_test_and_set_4

The fact that the script test_HAVE_IBV_ACCESS_RELAXED_ORDERING.sh doesn’t exist is strange. My first thought is that this is due to a mismatch with CUDA versions but I have tried many (11.4, 12.1 and 10.0). How should I go about debugging this, any help is appreciated!

Hi Andrew,

Thanks for the question. I can comment that the test_HAVE_IBV_ACCESS_RELAXED_ORDERING.sh issue is not related to the __sync_synchronize failure.
It looks like you are using the deprecated make build system. I would recommend compiling using the newer CMake build system. I am not confident this will solve your problem, but it’s worth a shot (Make has been deprecated and will be removed in our 3.0 release).

Alternatively, you might try the existing binaries. 2.11.0 has both Arm and x86 support.

There seems to be a known bug with __sync_synchronize (a system call outside of CUDA) on GCC for arm, but it appears to have been fixed in GCC 4.4.3. Can you confirm you are using the same GCC version mentioned in your compilation line? If it’s older, you may try upgrading, although it looks like you may be tied to one based on the spack references.

Hi Seth,

Thanks for the help! Good to know that the two issues are unrelated, I will try using the cmake solution instead. I don’t know why I glossed over that section of the documentation. As for the binaries, what support are they built with? For example is “NVSHMEM_UCX_SUPPORT” set to one or zero? My current understanding is that once built you cannot change these?

Again thanks for the help!