Compiler crash after using cuda::std::plus in cub

Hello. I am using cub::DeviceReduce::ReduceByKey method from CUB, which expects operator for performing the reduction, in my case it’s summation, see below.

cub::Sum sum_op;
cuda::std::plus plus_op; 
cub::DeviceReduce::ReduceByKey(
                nullptr,
                buffer_bytes, 
                //keys
                (uint32_t*)nullptr, (uint32_t*)nullptr,
                // values
                (float*)nullptr, (float*)nullptr,
                //remaining params
                (uint32_t*)nullptr, 
                plus_op, N); 

If I use cub::Sum, it runs without problems but if I use cuda::std::plus, I get a following compiler crash.

CUDACOMPILE : nvcc error : 'cudafe++' died with status 0xC0000005 (ACCESS_VIOLATION)

I am not advocating for or against the support of cuda::std::plus in this function, I would just wish for better/more interpretable error messages when I use this “wrong“ operator.

This compiler crash initially led me to believe the problem could be in msvc-cuda versioning issue or multiple cuda toolkit versions, and it was really difficult for me to find the real cause of the crash. This bug is likely Windows-specific. I tried on Ubuntu 24.04 without any issues.

This issue was found with the following setups:

CUDA-Toolkit 12.8 or 13.0 with compatible driver ( 580 for 13.0 and 572 for 12.8)

MSVC - Visual Studio 2022 version 17.8 or 17.14

Windows 10.0.19045

Laptop RTX 3060

Please, If you have further questions, feel free to ask.

Thank you kindly for your support.