Is there a metric for measuring how many times an atomic operation was blocked. I created a kernel that purposefully generated many blocks on atomicAdd of global memory. Can nv-nsight-cu count the number of atomic blocks/hits when executing this kernel.
global void run_atomic(float *C, size_t s)
int index = (blockDim.x * blockIdx.x + threadIdx.x);
if (index>=s-1) return;