ModernGPU ReduceByKey error with weird CUB interaction

I am trying to compare reduce_by_key from thrust, CUB, and ModernGPU (MGPU) libraries in one code, along the lines of a previous post on thrust and CUB only. The intention is not to benchmark, but just to make sure that I can use them correctly.

The thrust::reduce_by_key and cub::DeviceReduce::ReduceByKey calls play nicely together, and the thrust::reduce_by_key and ModernGPU ReduceByKey calls play nicely together, but when I call CUB after calling MGPU, the CUB stops working. cuda-memcheck says that there is an error in my MGPU code, but it is hard to find, as I only have one MGPU function call, the error is non-fatal, and the MGPU call goes on to get the correct result!

This post is just an abstract. Seeking a bit more exposure of posted the full problem over here:

Any help would be appreciated, thanks!

Never mind, this question was answered on the stackoverflow site.