Warp or thread level stats for memory metrics

I would like to know if the memory stats obtained by nvprof are at warp level or thread level. For example, the description of dram_read_transactions is “Device memory read transactions”. Also, for gld_transactions, I see “Number of global memory load transactions”.

The memory stats are for the kernel.

dram_read_transactions counts 32 byte sectors read from device memory.

gld_transactions varies with architecture:

  • In Maxwell - Pascal architecture (CC 5.* - 6.*) the counter counts request packets from SM to L1TEX.
    • <=32b load is 4 transactions of 8 threads each
    • 64b load is 8 transactions of 4 threads each
    • 128b load is 16 transactions of 2 threads
    • predicated off threads or inactive threads do not generate transactions
  • In Volta - Turing architecture (CC 7.*) the counter counts 32B sectors from L1TEX (hits and misses).
