illegal warp sync when calling __syncwarp(mask) on Turing sm_75 with "-G" flag

The reproducing code is too complicated to show and I can’t extract a minimal example. Basically, what happens is that there some reduction code that gives this runtime error only when compiled with the “-G” flag and run RTX 2080. The error is not reproducible on GTX1080 or when removing “-G” flag. The error message is:

terminate called after throwing an instance of 'std::runtime_error'
  what():  cudaDeviceSynchronize() error( cudaErrorIllegalInstruction): an illegal instruction was encountered /home/aznb/mycodes/kokkos/core/src/Cuda/Kokkos_Cuda_Impl.cpp:119
Traceback functionality not available

Aborted (core dumped)

I used cuda-gdb to trace the error back to:

(cuda-gdb) where
#0  0x00000000051128c0 in __cuda_sm70_warpsync ()
#1  0x00000000050c9180 in _INTERNAL_45_tmpxft_0000739c_00000000_6_testDomain_cpp1_ii_e6dfdd82::__syncwarp (mask=65280)                                        
    at /usr/local/cuda-10/Linux/RHEL6/x86_64/include/sm_30_intrinsics.hpp:110                                                         
#2  0x0000000002f7d810 in Kokkos::Impl::CudaReductionsFunctor<Kokkos::Max<int, Kokkos::HostSpace>, void, false, false>::scalar_intra_warp_reduction (         
    functor=(cached) 0x2aaab1fffb10, value=(cached) 0x2aaaaf000034, skip_vector=(cached) false, width=(cached) 8)                                             
    at /home/aznb/mycodes/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp:694  
...

A complete reproducible example would require the Kokkos library – see https://github.com/kokkos/kokkos/issues/1958 for details.

Can anyone give some insight about what could potentially go wrong here? Is this a CUDA bug or some misuse of __syncwarp?