I’m running a few different kernels through the visual profiler. I’m specifically looking at memory bandwidth. For most of the kernels I’ve checked there’s a massive difference in the “Memory Bandwidth and Utilization” table and the “Memory Statistics” diagram. For example my L2 read throughput shows 191GB/s in the table and 340MB/s in the diagram. Device memory write throughput is 25.5GB/s in the table, 45.7MB/s in the diagram.
Can anyone explain why I’m getting different numbers?
I’m running on a V100 DGX machine with Cuda 9.2 (not my machine can’t change version).