Whether the CUDA bandwidthTest example is wrong?(calculate bandwidth formula)

In the 721 line of bandwidthTest.cu,NVIDIA use “float(1<<10)” to change unit(ms to s),but I think it’s should be 1000.

bandwidthInMBs = ((float)(1<<10) * memSize * (float)MEMCOPY_ITERATIONS) /
(elapsedTimeInMs * (float)(1 << 20));
bandwidthInMBs = ((float)1000 * memSize * (float)MEMCOPY_ITERATIONS) /
(elapsedTimeInMs * (float)(1 << 20));

I agree with you. 1<<10 (1024) should be 1000, since the unit of elapsedTime is millisecond.
To compute bandwidth in MB/s,the transferred memory is divided by 1<<20 to covert to Megabyte.
Elapsed time in millisecond is divided by 1000 to covert to be in second. As a result, 1<<10 should be 1000.

Weird, why so complex ?

It gets even weirder in the version I found:

SDK 4.2:

//calculate bandwidth in MB/s
bandwidthInMBs = (1e3f * memSize * (float)MEMCOPY_ITERATIONS) / 
                                    (elapsedTimeInMs * (float)(1 << 20));

I guess that 1e3 means 1000 and f means a floating point format ?

I usually calculate bandwidth as follows:

Bytes / (Elapsed time in Milliseconds / 1000.0);

^ first convert milliseconds to seconds then divide bytes by seconds.

However this code takes two divisions.

However this code is only executed once, perhaps this is “over optimization” LOL.

Also I would like to see the kernel code/PTX instructions and such for this bandwidth kernel, if somebody could extract it and post it I would be a bit gratefull ! ;)

I did some experiments with “persistent threads” to see if that would increase the bandwidth of my own bandwidth test and indeed it did, but still not anywhere near this bandwidth test.

Not sure if this slight miscalculation is to blame for it ! ;)