I am going through the Memory transactions doc by NVIDIA found here: Memory Transactions
There are two metrics I am interested in:
L1 Above-Ideal Transactions and L1 Transfer Overhead
The metrics I would need according to the formulas are:
L1 Global Transactions Executed → l1tex__t_sectors_pipe_lsu_mem_global_op_ld.sum
L1 Local Transactions Executed → l1tex__t_sectors_pipe_lsu_mem_local_op_ld.sum
L1 Global Transactions Ideal → ?
L1 Local Transactions Ideal → ?
Bytes Requested → 32*l1tex__t_requests_pipe_lsu_mem_global_op_ld.sum
I managed to map a few of them while couldn’t for the rest. I would like to know if my mapping is correct and what will be the equivalent ncu metrics for the unmapped ones?
That is a fairly out dated document and on the newer architectures, it doesn’t have a 1:1 mapping in Nsight Compute. For L1, there is a single request per executed instruction, independent of the memory access pattern. If you’re trying to understand inefficient or excessive memory accesses, the metrics we would point you to are L2 Theoretical Sectors Global Ideal and L2 Theoretical Sectors Global Excessive. There is also L2 Theoretical Sectors Global, where ( L2 Theoretical Sectors Global - L2 Theoretical Sectors Global Ideal ) = L2 Theoretical Sectors Excessive. These metrics are available at the source line level as well as aggregated to the kernel level.
Even though the metric names contain “L2” in the name, this excessive number is a good indicator of both the L1 and L2 caches.
Let me know if you have any other questions.