Hi, everyone. In NVIDIA turing architecture whitepaper, the theoretical memory bandwidth is calculated as Memory interface * Memory Clock rate / 8 (GB/s). But in an old doc, reduction.ppt in CUDA samples, it is Memory interface * Memory Clock rate * 2 / 8 (GB/s). Which one should be adopted?
It’s going to depend on how the memory clock rate is indicated/specified. Specifically, its going to depend on how the memory clock rate translates to transfers per second. There is not one single answer.
Do you means that the given memory clock rate is a theoretical peak value and the actual memory clock rate varies during data transfer? Since Memory interface * Memory Clock rate is doubled in reduction.ppt, does NVIDIA GPUs support simultaneous bidirectional data transfer for now?
But according to NVIDIA turing architecture whitepaper, Memory Bandwidth (GB/sec) is Memory Interface * Memory Clock (Data Rate) / 8 bits. Given that Memory Interface is 352 bit and Memory Clock (Data Rate) is 11 Gbps for GTX 1080Ti, Can Memory Bandwidth of 484 GB/sec be treated as its theoretical memory bandwidth?
What do you mean the source for memory clock rate?
And if I responded to that, suggesting that is correct, you would immediately point out the example from the reduction paper, which you already pointed out.
I’m not going to go back and forth on this. Both are correct. There isn’t just one formula that applies to every case, for every situation. If you want to come up with one formula, I personally would do exactly what I already suggested. Create a formula that is based on transfers per second. Then for each situation you come across, see what relationship memory clock rate must have to transfers per second, in order to make that single equation correct.
The peak theoretical memory bandwidth of GTX 1080Ti is published:
You can decide for yourself what formula and memory clock rate you wish to use, to acknowledge the correctness of that published number.
I probably won’t be able to respond to further requests for clarification on this topic. The topic of what is the GPU memory bandwidth formula has been discussed in many places. If you want to read some of those, you may reach some conclusions that are useful for you.
The difference of *2 factor could be due to half duplex bandwidth vs full duplex bandwidth. Can you confirm if the white papers mention anything about the bi-directional transfer rate?
Normally DRAM does not have full-duplex. The difference more likely comes from DDR (double data rate) memory, which uses the falling and rising edge of the clock signal.
One open question is whether (with newer GPUs) the memory clock is an actual physical memory clock or just the transfer rate, see for example
It is a type of GDDR SDRAM (graphics DDR SDRAM), and is the successor to GDDR5. Just like GDDR5X it uses QDR (quad data rate) in reference to the write command clock (WCK) and ODR (Octal Data Rate) in reference to the command clock (CK).
So there are different clock speeds at the same time. For easier product comparison or tuning (overclocking, etc.) Nvidia probably only talks about a single frequency.