Simple question: Is the official bandwidth 1-way or 2-way?

I mean the 86.4GB/s for 8800GTX, 141GB/s for GT280 data. Do I have to both read and write global memory in the same kernel to reach the peak bandwidth?
3X!

The DRAM bus is half-duplex, so full bandwidth can be achieved 1-way.

Correct. The numbers above are theoretical peaks for one direction. If your kernel reads and then writes (say, a memcopy kernel), then to measure the bandwidth you’re achieving you count each byte twice - once for reading, once for writing. Then you can compare it against the theoretical peak.

In practice, you should be able to get to about 80% of the theoretical peak.

Paulius

93 if you try real hard.