THEORETICAL BANDWIDTH vs EFFECTIVE BANDWIDTH

AdrianAioanei · February 21, 2017, 12:27pm

Hey guys,

I have a problem with my GPU bandwidth calculations.
I fallow and use the code from here : How to Implement Performance Metrics in CUDA C/C++ | NVIDIA Technical Blog

And i get some strange results.
I have a NVIDIA GeForce GT 750M with bus width of 128 bits and the theoretical bandwidth calculated and also from nvccv is of 64 GB.
But when i run the code from the link the result is about 30GB.
Has anyone meet this before?
Why is the difference so big?
Thank you :)

Robert_Crovella · February 21, 2017, 1:19pm

Perhaps your calculation is incorrect. I’m not familiar with nvccv, what is that?

I’m not sure the code presented in that link is intended to be a good measure of “best achievable bandwidth”. What does bandwidthTest sample code report on your GT 750M for the device to device bandwidth number?

njuffa · February 21, 2017, 5:51pm

Memory throughput would generally be measured in GB/sec, not GB. In general, the best achievable bandwidth of GPU memory is typically in the range of 75% to 85% of the theoretical value, dependent on a variety of factors (e.g. ECC on/off, the type of DRAM used, GPU core frequency).

AdrianAioanei · February 22, 2017, 6:31am

Yes, i understand but from 64 on theoretical to 30 on effective seems a little to much for me.
I have try bandwidthTest and the results was almost the same.
Also i have try some online benchmarks and the results was the same.
So the problem could be from ECC ?
How can i turn it on ?:)

njuffa · February 22, 2017, 8:01am

As far as I know, the GT 750M is produced in multiple OEM versions that differ in memory type (DDR3, GGDR5) and memory clock rates. Wikipedia lists theoretical bandwidth of 32 GB/sec to 80 GB/sec for the GT 750M (https://en.wikipedia.org/wiki/GeForce_700_series). That’s a pretty wide range.

NVIDIA’s specification page (http://www.geforce.com/hardware/notebook-gpus/geforce-gt-750m/specifications) states:

So the question is, what are the OEM’s specifications for your specific GT 750M?

I am pretty sure no consumer cards have support for ECC. Where supported, turning on ECC will reduce useable memory bandwidth not inrease it.

Robert_Crovella · February 22, 2017, 4:37pm

If bandwidthTest is reporting ~30GB/s, then that is what your GPU is capable of. It is not necessarily the case that your theoretical number is 64GB/s. In that case, I would suspect that your calculations are wrong.

AdrianAioanei · February 22, 2017, 6:34pm

Yes, i have done a mistake.
Here i have seen Frame buffer bandwidth but i think is not the same thing.

External Media

njuffa · February 22, 2017, 6:57pm

The computation is not necessarily wrong, it all depends on how the memory clock is reported. Assuming the 2005 MHz shown are the effective clock rate, that is, already taking into account that data is transferred on both rising and falling clock edges, theoretical bandwidth would be 32.08 GB/sec (= (128 / 8) * 2005e6/sec).

Whatever program you are using to display the frame buffer bandwidth (this is the number you are interested in, by the way) seems to assume that 2005 MHz is the raw clock rate, thus arriving at twice the bandwidth.

Comparison with other GPUs shows that 2005 MHz must be the effective clock rate here. If you look at the table in the Wikipedia, the GTX Titan Black for example is listed with a memory clock of 7000 MHz, and the theoretical bandwidth is (384 / 8) * 7000e6/sec = 336 GB/sec, which is definitely the correct value.

AdrianAioanei · February 23, 2017, 8:06am

Sorry but i am confuse.
Data from above picture are from Visual Studio. I have added CUDA to VS and you can see some specification from VS.

So the bandwidthTest sample say 30GB, wikipedia for 750M say 30GB.

And i have:
Memory clock: 2005
Bus width : 128 bits

BW Th = 2005 * 10^6 * (128/8) * 2 /10^9 = 64.16 GB/s

In this calculation, we convert the memory clock rate to Hz, multiply it by the interface width (divided by 8, to convert bits to bytes) and multiply by 2 due to the double data rate. Finally, we divide by 109 to convert the result to GB/s.

Above is multiply with 2 becouse in the exemple from Nvidia website is used a DDR memory (double rate transfer).
My memory is GDDR5 and it say 5x data transfer. ([url]http://www.tomshardware.co.uk/forum/331796-33-ddr3-gddr3-gddr5[/url])

njuffa · February 23, 2017, 8:32am

Various sites tell me that GDDR5 actually uses quad-pumped interfaces, while DDR3 uses double-pumped interfaces. To avoid having to deal with actual frequencies and type-specific multipliers, people often operate with “effective” clock rates, which is the physical clock rate multiplied by the memory-type specific multiplier.

This seems to be the case here: the 2000 MHz stated represent an “effective” clock rate, therefore no additional factor of 2 should be multiplied into the equation. Effective clock rates for GDDR5 can go all the way to 7000 GHz.

If you want another take on this, open the NVIDIA Control Panel, go to “Help” then “System Information” and look for “Memory bandwidth” in the output. In my case it says I have GDDR5 on a 128-bit interface running at a memory clock of 5010 MHz, for bandwidth of 80.16 GB/s.

AdrianAioanei · February 23, 2017, 6:21pm

Hey, also on Nvidia say 64 GB.
External Media
They say memeory data rate: 4010 MHz…

And also for bandwidthTest i have just option for pinned and pageable…
External Media

njuffa · February 23, 2017, 7:01pm

[s]Weird. Not sure what to make of this. Sadly, the output displayed does not show the memory clock, which would be crucial to try to figure out what may be going on. Given the type of memory (GDDR5), the interface width, and the memory bandwidth, I would expect that to be shown as 4000 MHz.

I have a Quadro K2200 in my system, which is basically a GTX 750 Ti with a larger memory. The output from the control panel says:

Driver version:              376.84
Direct3D API version:        11
Direct3D feature level:      11_0
CUDA cores:                  640
Graphics clock:              1045 MHz
Memory data rate:            5010 MHz
Memory interface:            128-bit
Memory bandwidth:            80.16 GB/s
Total available graphics ... 7913 MB
Dedicated video memory:      4096 MB GDDR5

Note the use of the term “memory data rate”, which is a more accurate way of saying “effective memory clock” (which isn’t actually a clock rate in the physical sense).

I’d say we have taken this remote diagnosis as far as it can go. If you believe that your laptop’s characteristics are not consistent with what the vendor stated in their specifications, you might want to consider taking up the issue with them.

For reference (may be useful for future readers): What is the brand and type of laptop this GT 750 is in?[/s]

Robert_Crovella · February 23, 2017, 8:00pm

You’re getting 47GB/s for D2D on bandwidthTest. You previously said this:

“I have try bandwidthTest and the results was almost the same.”

I would not agree with that statement, and in fact would say it is misleading, given that bandwidthTest is reporting 47GB/s not some number that is approximately 30GB/s which is what you originally reported at this statement:

“But when i run the code from the link the result is about 30GB.”

So I go back to my original statement:

“I’m not sure the code presented in that link is intended to be a good measure of “best achievable bandwidth”.”

Your GPU peak theoretical bw is 64GB/s, and the achievable bandwidth (as reported by bandwidthTest) is 47GB/s.

These are expected results, and the fact that you don’t get 47GB/s from the code in the first link you gave does not mean that there is something wrong with your system or that your calculations are incorrect, it means that

that code will not give a reliable estimate of the available bandwidth.

The text in that link states:

“We can modify our SAXPY example to calculate the effective bandwidth.”

This is not the peak theoretical bandwidth, nor is it the maximum achievable bandwidth. It is the bandwidth achieved by that particular saxpy example. The fact that it is at 30GB/s vs. 47 achievable means that that particular code (or perhaps the way you compiled it) will not achieve the max available bandwidth.

njuffa · February 23, 2017, 8:20pm

I wish you had included the output of the bandwidth test application in #11 before I posted my reply in #12; this information seems to have been added only later.

As txbob says, a measured throughput of 47.41 GB/sec out of a theoretical 64.16 GB/sec (so 73.9% of theoretical) seems perfectly fine, and is roughly consistent with the efficiency range I stated in #3.

Case closed.

Topic		Replies	Views
Quadro 4000 Bandwidth The device to device bandwidth obtained with CUDA Programming and Performance	8	3516	March 7, 2011
Using bandwidthTest tool, D2D performance More than the official given bandwidth CUDA Programming and Performance cuda	6	838	October 28, 2022
Is my bandwidth calculation right? bandwidth CUDA Programming and Performance	3	1449	November 13, 2009
Bandwith Problem CUDA Programming and Performance	7	2633	March 16, 2009
Device to device bandwidth, bandwidth test vs theoretical maximum CUDA Programming and Performance	7	3183	May 27, 2014
How to calculate the theoretical memory bandwidth? CUDA Programming and Performance	8	8036	December 18, 2024
Using bandwidthTest, D2D performance exceeds theoretical bandwidth CUDA Programming and Performance cuda	1	393	October 27, 2022
Is the GDDR5X transfer size 256B on the GTX 1080 Ti? CUDA Programming and Performance	6	1200	September 28, 2017
Effective Bandwidth Problem CUDA Programming and Performance	13	7709	March 23, 2011
Bandwidth measurement Theortical bandwidth vs BandwidthTest(SDK) results CUDA Programming and Performance	4	1549	May 30, 2011

THEORETICAL BANDWIDTH vs EFFECTIVE BANDWIDTH

Related topics