Why is the value in the Last column of L1/TEX Total 3982, while the three rows above it are all zeros? Doesn’t Total usually represent the sum of the values in the rows above it?
The total can contain values that are not separately exposed and tracked in the table. It should never be smaller than the sum of the preceding values, but it can be larger if some data seen in the HW is not available in the tool.
Please clarify which version of Nsight Compute you are using and on which GPU you profile.
I am currently using Nsight Compute version 2024.3.0 on A10 and prefetching data into the L2 cache using the “prefetch” PTX instruction via inline assembly. Does the memory Requests/Sectors caused by “prefetch” appear in the L2 Cache Memory Table under L1/TEX Total column? Additionally, in the above table, does the ‘Last’ column entry indicate a hit because the content loaded by my “prefetch” instruction is already in the cache, or is it due to an LDG (Load Global) instruction hitting in the L2 cache?
The total does include sectors for prefetches, yes, but there is no individual metric to show them separately unfortunately.
The L2 ...last_lookup_hit
“total” metric is susceptible to other unrelated activity, and would be collected in a separate pass from the individual ones. In fact, each of the totals for hit, miss and sum would be in a separate replay pass. It is hard to determine whether that was user activity or not. You could collect and the same metric, but with srcnode_gpc
added, i.e. lts__t_sectors_srcnode_gpc_evict_last_lookup_hit.sum
, to confirm whether it was your kernel that caused it.
If not already in the report (check the Raw page), you can collect individual metrics using --metrics ...
on the command line, or using the respective form in the Metric Selection tool/activity window.
You can find more info on some of these terms in the docs here.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.