The bandwidth test result in the CUDA samples is odd

Why is the GPU memory bandwidth so high?CUDA :12.1
bandwithtest

If you run the app repeatedly, does it always show this implausibly high value?

I have not looked at the code for this app in a long time, and I do not know the RTX A6000 specs off the top of my head. To my knowledge, there is no GPU in existence that can transfers data in global memory at 4TB / sec. Three hypotheses in order of decreasing plausibility that spring to mind:

(1) The RTX A6000 has large enough cache that at this particular transfer size the app is actually measuring cache throughput rather than global memory throughput.

(2) The RTX A6000, with the given transfer size, transfers the data so fast that the resolution of the timer used for timing the transfers is exceeded.

(3) The RTX A6000, with the given transfer size, transfers the data so fast that an integer arithmetic overflow in intermediate computation occurs when computing the bandwidth.

If this still happens with CUDA 12.1 Update 1, NVIDIA should probably review this app and make sure it delivers plausible data for the latest high-end GPUs. You may want to file a bug to that effect.

1 Like