Hello, i am try to profile some NLP model on GPU, RTX3090.
Through profiling, the CPU-GPU stream was identified and the kernel operation of the GPU was also confirmed.
I set the sampling period to 10kHz and profiled the metric of RTX 3090. But there are too many metrics. Among them, the curious metric is as follows.
l- L1 LSU Data-Stage Throughput
l- L1 LSU Writeback-Stage Throughput’
L1 LSU Throughputs
l- L1 Local-Global Data-Stage Throughput
l- L1 Shared-Attribute Data-Stage Throughput
I think these metrics are related to L1 cache’s data transfer, but I don’t know exactly what it means.
In the case of Nsight compound, the description of the metric is well written in the document, but in the case of nsight system metric, it was not found.
I’ll attach a picture
@Andrey_Trachenko are these counters available in Nsys or only in Ncu?
I know, these metric are used in Nsight Graphics. So, i tried to find Nsight graphics document about these metric. But, i couldn’t find that.
ga10x-gfxt is the shorthand name for “Graphics Throughput Metrics for NVIDIA GA10x (frequency >= 10kHz)”
Indeed, we don’t have public documentation for some of the metrics that are included. Similar metrics should also be available in Nsight Graphics (GPU Trace). Since these are graphics-related metrics, I don’t think they are available in Nsight Compute.
Meanwhile, please see if the GPU Trace documentation would help you understand the metrics better: Advanced Learning :: Nsight Graphics Documentation
Thanks you for taking care of it.
I did profiled Pytorch model on NCU tool by using system trace with Graphics Throughput Metrics for NVDIA GA10x.
I know I can check the cache hit, miss of each kernel in NCU tool, but I wanted to check what the metric looks like overall.
Let’s look at the document and organize the metric.