What does "Memory Throughput" in Nsight Compute mean?

I ran a program using an RTX 4090 (with a device bandwidth of 1008 GB/s) and obtained the following image.

In the image, the total bandwidth between device memory and L2 cache is 327.33 GB/s + 2.17 GB/s = 329.5 GB/s, and the Memory Throughput metric is 67.56%. Therefore, the peak bandwidth of the machine is 329.5 / 67.56% = 487.7 GB/s.

There is a significant discrepancy between the two data points of 487.7 GB/s and 1008 GB/s. Could it be that I misunderstood the meaning of Memory Throughput? If I did misunderstand, how should I determine the bandwidth utilization of a program?

These metrics are not related as in your calculation. What is shown in the chart is the achieved throughput between L2 and Device Memory. The color indicates that is ~35% for dram reads in your case. The overall Memory Throughput metric is computed as the max of a wide range of constituents, all related to the various memory subsystems of the HW. If you select the GPU Throughput Breakdown section body from the GPU Speed Of Light Throughput section on the Details page, you can see the individual sub-metrics of this throughput, and which of these is utilized 67%. Note that not all of these sub-metrics are visually represented on the memory chart, as that is not technically feasible.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.