Does anyone know what gld_128b exactly mean in cuda profiler?
Does it collect the reads in one block, or one kernel or "all simultaneous blocks of one multiprocessor?
The description just says “128-byte global memory load transactions.”
All my kernels match the case of “all simultaneous blocks in a multiprocessor”,
except for one kernel where each block use all shared memory.
I am working on GTX280, Profile version 1.4