Bandwidth measurement in "bandWidthTest.cu"

The “bandWidthTest.cu” file in CUDA SDK uses two different formulae for measuring the bandwidth. Can someone explain why ?

In function:

float testHostToDeviceTransfer(unsigned int memSize, memoryMode memMode)

{

    //calculate bandwidth in MB/s

    bandwidthInMBs = (1e3 * memSize * (float)MEMCOPY_ITERATIONS) / 

                                        (elapsedTimeInMs * (float)(1 << 20));

}
float testDeviceToDeviceTransfer(unsigned int memSize)

{

    //calculate bandwidth in MB/s

    bandwidthInMBs = 2.0f * (1e3 * memSize * (float)MEMCOPY_ITERATIONS) / 

                                        (elapsedTimeInMs * (float)(1 << 20));

}

Why is there an extra (2.0f) in the above formula for computing bandwidth for DeviceToDevice transfer ?

Because a device to device transfer involves and “read” and a “write” for each byte in the data. So the total bandwidth must count every byte read and written to be correct, hence the factor of 2.