I am profiling an application on GeForce GTX 1080. The application uses Unified Memory.
I was experimenting with the memory hints, and on profiling one of the kernels I am getting a large number (126203) for the event l2_subp0_write_sysmem_sector_queries. (I am assuming this is due to the hint “AccessedBy”) The description of this event says “Number of system memory write requests to slice 0 of L2 cache. This increments by 1 for each 32-byte access.”
This statement is slightly confusing. Since L2 cache is on a lower level than system memory, the write request should be sent “from” L2 cache “to” system memory.
Similarly in the description for “l2_subp1_total_read_sector_queries” system memory is included as if it is on the same level as L1 and Texture cache. The definition says: “Total read requests to slice 1 of L2 cache. This includes requests from L1, Texture cache, system memory.”
Can someone please clarify how this works? Or is there some issue in the way I am interpreting this definition?