For (what I am pretty sure are) perfectly coalesced accesses to an array of 4096 doubles, each 8 bytes, nvprof reports the following metrics on a Nvidia Tesla V100:
global_load_requests: 128
gld_transactions: 1024
gld_transactions_per_request: 8.000000
I cannot find a specific definition of what a transaction and a request to global memory are exactly, so I am having trouble understanding these metrics. Therefore my questions:
- How is a memory request defined exactly? Is it something like a warp-level load instruction for 32 threads at once?
- How is a memory transaction defined? Is it something like a load instruction of fixed size 32 bytes?
- Does gld_transactions_per_request = 8.00000 indicate perfectly coalesced accesses to doubles?