The reproducing code is too complicated to show and I can’t extract a minimal example. Basically, what happens is that there some reduction code that gives this runtime error only when compiled with the “-G” flag and run RTX 2080. The error is not reproducible on GTX1080 or when removing “-G” flag. The error message is:
terminate called after throwing an instance of 'std::runtime_error'
what(): cudaDeviceSynchronize() error( cudaErrorIllegalInstruction): an illegal instruction was encountered /home/aznb/mycodes/kokkos/core/src/Cuda/Kokkos_Cuda_Impl.cpp:119
Traceback functionality not available
Aborted (core dumped)
I used cuda-gdb to trace the error back to:
(cuda-gdb) where
#0 0x00000000051128c0 in __cuda_sm70_warpsync ()
#1 0x00000000050c9180 in _INTERNAL_45_tmpxft_0000739c_00000000_6_testDomain_cpp1_ii_e6dfdd82::__syncwarp (mask=65280)
at /usr/local/cuda-10/Linux/RHEL6/x86_64/include/sm_30_intrinsics.hpp:110
#2 0x0000000002f7d810 in Kokkos::Impl::CudaReductionsFunctor<Kokkos::Max<int, Kokkos::HostSpace>, void, false, false>::scalar_intra_warp_reduction (
functor=(cached) 0x2aaab1fffb10, value=(cached) 0x2aaaaf000034, skip_vector=(cached) false, width=(cached) 8)
at /home/aznb/mycodes/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp:694
...
A complete reproducible example would require the Kokkos library – see https://github.com/kokkos/kokkos/issues/1958 for details.
Can anyone give some insight about what could potentially go wrong here? Is this a CUDA bug or some misuse of __syncwarp?