Is there a way to measure reduction "collisions" on P100 using nvprof

I have a kernel that sometimes loses performance and I suspect that this might be due to reductions from different WRAPS working on the same address, causing a collision. I am wondering if there is a way to measure such “collisions” on a P100 GPU, using nvprof. For older GPUs I think the replay metrics would have worked well for me but these are not available for the P100 GPU.