I am using CUDA 11.2 NSIGHT Compute to analyze a kernel.
I get a warning about uncoalesced access. Even though it is a tiny amount: 131,072 instead of 131,070, I wonder where it comes from?
All three lines that are warned about, access global memory with a well-defined pattern:
blockIdx.x * blockDim.x + threadIdx.x so it’s strange.
Why could the sector count be off by just 2?
If I have to guess, it’s the expected value that is off, because it’s not a round number.
>>> hex(131070) '0x1fffe' >>> hex(131072) '0x20000'
I have 16384 blocks of size 128, and I am on Ubuntu with an rtx3070. The offending code reads __half values, by the way.