GPU Memory how to find the GPU memory bandwidth

i need to calculate the Memory Bandwidth of my graphics card…i m using Directx 9.0c (June 2006) and Visual Studio 2005,(programming in VC++).

I know of one way how to calulate it, by using the GPU Memory Clock Speed and the GPU Memory Bus width…but the problem is that i dont know how to extract these parameters…

Can anyone pls tell me how to get those 2 parameters or another method to calculte the memory bandwidth of the graphics card…

Why don’t you just look it up (example and make a small table in the app?


i m doing that for now as a temp fix, but the problem with that is we would be finding out the theoretical value of the memory bandwidth, which is not what i want…
i need to find the practical value…any ideas??


The “practical value” lies in the eye of the beholder, i.e. the actual program. So it can vary tremendously depending on how good your code can hide latency etc. The only sensible test I can think of for this case is to actually do a test run of the program and get the timing. If you store the timing value of known cards with the theoretical list, you can extrapolate performance of other cards.


can u elaborate on that…i didnt quite get u…



Use GPUBench…

It will give you the performances for cached, streaming and random memory bandwidth. As well as a whole bunch of other interesting info.

But you should note that GPUBench does not use CUDA, so YMMV.


The idea is the following:

a) the application uses a very special mix of ALU and memory operations, so generic isolated tests for bandwidth etc (like in GPUBench) won’t come up with a sensible answer and might also depend on CPU performance

B) on the other hand, there is info by the manufacturer about theoretical bandwidth and clock speeds

I suggest to make a list with the theoretical values. Then run the application on say 2 or 3 cards in the list and note the timings. Now you can see for example if your application takes 1 second on the 500MHz and 10GB/sec card, you can estimate how fast it will be on a 800MHz and 15GB/sec card. You need to do the initial timings on 2 or 3 cards in order to see how your application scales, i.e. what the factor in front of the linear term of the performance curve is.


one EVGA 8800GTS on a TYAN S2895A2NRF board w/ dual Opteron 246’s:

Anyone have any idea why the up and down rates are a factor of 3 different?


[edited to add]

ps – the difference goes away when the --memory=pinned option is used.

What performance do you get with pinned? I noticed your Dev-to-Dev speed was a little slow. I’m getting approximately 8.8GB/s instead of 3.3GB/s. Is your card in a PCIe x8 instead of a x16?

With 0.9,

.. ./bandwidthTest --memory=pinned

Quick Mode

Host to Device Bandwidth for Pinned memory


Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432               2699.9

Quick Mode

Device to Host Bandwidth for Pinned memory


Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432               2693.0

Quick Mode

Device to Device Bandwidth


Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432               53768.0

much better than 0.8, and about what we’ve seen here from others.

(this is on Ubuntu Feisty 32-bit, on a dedicated 8800GTS NOT running X or any other graphics)