My kernel writes to global memory 8 bytes for each thread, no matter what. The profiler however, reported a number different than expected. The kernel has one transaction per request. The reported global store size is 4 to 10 times bigger than expected, depending on different runs of the kernel (the kernel does have random numbers involved and have divergance, but it alway should write 8 bytes).
I want to understand what really the profiler data means. Even if it means something different than I expected, at least from different runs it should report the same number, as the kernel only requests constant number of data to store.