I am trying to compare reduce_by_key from thrust, CUB, and ModernGPU (MGPU) libraries in one code, along the lines of a previous post on thrust and CUB only. The intention is not to benchmark, but just to make sure that I can use them correctly.
The thrust::reduce_by_key and cub::DeviceReduce::ReduceByKey calls play nicely together, and the thrust::reduce_by_key and ModernGPU ReduceByKey calls play nicely together, but when I call CUB after calling MGPU, the CUB stops working. cuda-memcheck says that there is an error in my MGPU code, but it is hard to find, as I only have one MGPU function call, the error is non-fatal, and the MGPU call goes on to get the correct result!
This post is just an abstract. Seeking a bit more exposure of posted the full problem over here:
Any help would be appreciated, thanks!