Hello, I profile a linear op, with input (32768x2048) and weight (2048x32768), and from the NCU report, I can see,
dram__bytes_read.sum = 2.43 GB
dram__bytes_write.sum = 2.14 GB
I don’t understand how these values are obtained. Could you please share me the formula?
The metrics are computed as the number of sectors transferred to/from this memory unit, multiplied by the size of a sector in bytes (which is constant 32). They are not derived from any algorithmic property of your kernel, but measured in HW. This means that any data transfers that you may consider “overhead” will also be included, such as writing back from cache to DRAM, re-reading from DRAM, excessive sectors transferred due to uncoalesced accesses, and so on.
I recommend to check the Memory Workload Analysis section in the UI, including the Memory Workload Analysis Chart and Tables, to understand these metrics in more detail. You can also refer to the documentation on Sectors/Req(uest)
, for example.