I have a 8800GTX and I would like to know the maximum bandwidth I can achieve.
Actually, I’d like to measure this bandwidth to see if I can speed-up my kernel.
So, I have donne a simple kernel where each thread write 3 times in global memory and read 2 times in global memory.
I know that the term “GB” is ambiguous. 1GB = 1000^3 btes or 1GB = 1024^3 byte?
maximum bandwidth is 86,4 GB/s I achived so far 75 GB/s, where GB is = 1000^3.
To measure performance, simply count the accesses to memory (no matter whether read or write) multiply by 4 (thats for the float) and divide by the seconds taken.
If your kernel ist too fast, then simply do the same kernel again and again and mulitply by the number of iterations.
The compiler may be optimizing out the multiple reads/writes to tab[1]. You can check by compiling with the -ptx option and examining the ptx code.
To obtain the bandwidth of ~70 GiB/s in a benchmark, you need fully coalesced reads and writes of a 4-byte or 8-byte type and you need to copy a large enough chunk of data so that the overhead of launching the kernel doesn’t affect the results.