THEORETICAL BANDWIDTH vs EFFECTIVE BANDWIDTH

Hey guys,

I have a problem with my GPU bandwidth calculations.
I fallow and use the code from here : https://devblogs.nvidia.com/parallelforall/how-implement-performance-metrics-cuda-cc/

And i get some strange results.
I have a NVIDIA GeForce GT 750M with bus width of 128 bits and the theoretical bandwidth calculated and also from nvccv is of 64 GB.
But when i run the code from the link the result is about 30GB.
Has anyone meet this before?
Why is the difference so big?
Thank you :)

Perhaps your calculation is incorrect. I’m not familiar with nvccv, what is that?

I’m not sure the code presented in that link is intended to be a good measure of “best achievable bandwidth”. What does bandwidthTest sample code report on your GT 750M for the device to device bandwidth number?

Memory throughput would generally be measured in GB/sec, not GB. In general, the best achievable bandwidth of GPU memory is typically in the range of 75% to 85% of the theoretical value, dependent on a variety of factors (e.g. ECC on/off, the type of DRAM used, GPU core frequency).

Yes, i understand but from 64 on theoretical to 30 on effective seems a little to much for me.
I have try bandwidthTest and the results was almost the same.
Also i have try some online benchmarks and the results was the same.
So the problem could be from ECC ?
How can i turn it on ?:)

As far as I know, the GT 750M is produced in multiple OEM versions that differ in memory type (DDR3, GGDR5) and memory clock rates. Wikipedia lists theoretical bandwidth of 32 GB/sec to 80 GB/sec for the GT 750M (https://en.wikipedia.org/wiki/GeForce_700_series). That’s a pretty wide range.

NVIDIA’s specification page (http://www.geforce.com/hardware/notebook-gpus/geforce-gt-750m/specifications) states:

So the question is, what are the OEM’s specifications for your specific GT 750M?

I am pretty sure no consumer cards have support for ECC. Where supported, turning on ECC will reduce useable memory bandwidth not inrease it.

If bandwidthTest is reporting ~30GB/s, then that is what your GPU is capable of. It is not necessarily the case that your theoretical number is 64GB/s. In that case, I would suspect that your calculations are wrong.

Yes, i have done a mistake.
Here i have seen Frame buffer bandwidth but i think is not the same thing.

The computation is not necessarily wrong, it all depends on how the memory clock is reported. Assuming the 2005 MHz shown are the effective clock rate, that is, already taking into account that data is transferred on both rising and falling clock edges, theoretical bandwidth would be 32.08 GB/sec (= (128 / 8) * 2005e6/sec).

Whatever program you are using to display the frame buffer bandwidth (this is the number you are interested in, by the way) seems to assume that 2005 MHz is the raw clock rate, thus arriving at twice the bandwidth.

Comparison with other GPUs shows that 2005 MHz must be the effective clock rate here. If you look at the table in the Wikipedia, the GTX Titan Black for example is listed with a memory clock of 7000 MHz, and the theoretical bandwidth is (384 / 8) * 7000e6/sec = 336 GB/sec, which is definitely the correct value.

Sorry but i am confuse.
Data from above picture are from Visual Studio. I have added CUDA to VS and you can see some specification from VS.

So the bandwidthTest sample say 30GB, wikipedia for 750M say 30GB.

And i have:
Memory clock: 2005
Bus width : 128 bits

BW Th = 2005 * 10^6 * (128/8) * 2 /10^9 = 64.16 GB/s

In this calculation, we convert the memory clock rate to Hz, multiply it by the interface width (divided by 8, to convert bits to bytes) and multiply by 2 due to the double data rate. Finally, we divide by 109 to convert the result to GB/s.

Above is multiply with 2 becouse in the exemple from Nvidia website is used a DDR memory (double rate transfer).
My memory is GDDR5 and it say 5x data transfer. (http://www.tomshardware.co.uk/forum/331796-33-ddr3-gddr3-gddr5)

Various sites tell me that GDDR5 actually uses quad-pumped interfaces, while DDR3 uses double-pumped interfaces. To avoid having to deal with actual frequencies and type-specific multipliers, people often operate with “effective” clock rates, which is the physical clock rate multiplied by the memory-type specific multiplier.

This seems to be the case here: the 2000 MHz stated represent an “effective” clock rate, therefore no additional factor of 2 should be multiplied into the equation. Effective clock rates for GDDR5 can go all the way to 7000 GHz.

If you want another take on this, open the NVIDIA Control Panel, go to “Help” then “System Information” and look for “Memory bandwidth” in the output. In my case it says I have GDDR5 on a 128-bit interface running at a memory clock of 5010 MHz, for bandwidth of 80.16 GB/s.

Hey, also on Nvidia say 64 GB.

They say memeory data rate: 4010 MHz…

And also for bandwidthTest i have just option for pinned and pageable…

[s]Weird. Not sure what to make of this. Sadly, the output displayed does not show the memory clock, which would be crucial to try to figure out what may be going on. Given the type of memory (GDDR5), the interface width, and the memory bandwidth, I would expect that to be shown as 4000 MHz.

I have a Quadro K2200 in my system, which is basically a GTX 750 Ti with a larger memory. The output from the control panel says:

Driver version:              376.84
Direct3D API version:        11
Direct3D feature level:      11_0
CUDA cores:                  640
Graphics clock:              1045 MHz
Memory data rate:            5010 MHz
Memory interface:            128-bit
Memory bandwidth:            80.16 GB/s
Total available graphics ... 7913 MB
Dedicated video memory:      4096 MB GDDR5

Note the use of the term “memory data rate”, which is a more accurate way of saying “effective memory clock” (which isn’t actually a clock rate in the physical sense).

I’d say we have taken this remote diagnosis as far as it can go. If you believe that your laptop’s characteristics are not consistent with what the vendor stated in their specifications, you might want to consider taking up the issue with them.

For reference (may be useful for future readers): What is the brand and type of laptop this GT 750 is in?[/s]

You’re getting 47GB/s for D2D on bandwidthTest. You previously said this:

“I have try bandwidthTest and the results was almost the same.”

I would not agree with that statement, and in fact would say it is misleading, given that bandwidthTest is reporting 47GB/s not some number that is approximately 30GB/s which is what you originally reported at this statement:

“But when i run the code from the link the result is about 30GB.”

So I go back to my original statement:

“I’m not sure the code presented in that link is intended to be a good measure of “best achievable bandwidth”.”

Your GPU peak theoretical bw is 64GB/s, and the achievable bandwidth (as reported by bandwidthTest) is 47GB/s.

These are expected results, and the fact that you don’t get 47GB/s from the code in the first link you gave does not mean that there is something wrong with your system or that your calculations are incorrect, it means that

that code will not give a reliable estimate of the available bandwidth.

The text in that link states:

“We can modify our SAXPY example to calculate the effective bandwidth.”

This is not the peak theoretical bandwidth, nor is it the maximum achievable bandwidth. It is the bandwidth achieved by that particular saxpy example. The fact that it is at 30GB/s vs. 47 achievable means that that particular code (or perhaps the way you compiled it) will not achieve the max available bandwidth.

I wish you had included the output of the bandwidth test application in #11 before I posted my reply in #12; this information seems to have been added only later.

As txbob says, a measured throughput of 47.41 GB/sec out of a theoretical 64.16 GB/sec (so 73.9% of theoretical) seems perfectly fine, and is roughly consistent with the efficiency range I stated in #3.

Case closed.