I am learning about CUDA optimizations. I found a presentation on this link: Optimizing CUDA by Paulius Micikevicius.
In this presentation, they talk about MAXIMIZE GLOBAL MEMORY BANDWIDTH, they say global memory coalescing will improve the bandwidth.
My question, How do you calculate the Global Memory Bandwidth. Can anyone explain me with a simple program example.
You can’t calculate the global memory bandwidth, but you can find it on the spec sheet for your device (check the Nvidia website). In actual programs you will be able to achieve at most about 70% or so of this theoretical maximum.
You can also run the bandwidthTest from the SDK to measure bandwidth on your device.
it s 50Go/s to 100Go/s depend of your graphic card
in some program i have 80% 90%
if its 50Go/s and you read and write 400Mo
your time must be 0.4/50*2 = 16ms at 100%
or 16/0.7 =23ms at 70%