Weird, why so complex ?
It gets even weirder in the version I found:
//calculate bandwidth in MB/s
bandwidthInMBs = (1e3f * memSize * (float)MEMCOPY_ITERATIONS) /
(elapsedTimeInMs * (float)(1 << 20));
I guess that 1e3 means 1000 and f means a floating point format ?
I usually calculate bandwidth as follows:
Bytes / (Elapsed time in Milliseconds / 1000.0);
^ first convert milliseconds to seconds then divide bytes by seconds.
However this code takes two divisions.
However this code is only executed once, perhaps this is “over optimization” LOL.
Also I would like to see the kernel code/PTX instructions and such for this bandwidth kernel, if somebody could extract it and post it I would be a bit gratefull ! ;)
I did some experiments with “persistent threads” to see if that would increase the bandwidth of my own bandwidth test and indeed it did, but still not anywhere near this bandwidth test.
Not sure if this slight miscalculation is to blame for it ! ;)