Hi all,
I’m experiencing some weird timing results for memcpy operations on the 9800GX2 using the bandwidthTest executable:
For my Quadro FX 1600M, timing results for memory are:
Host to Device Bandwidth for Pageable/Pinned memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1061.6 / 2253.5
Device to Host Bandwidth for Pageable/Pinned memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1105.6 / 1626.5
Device to Device Bandwidth
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 16725.9 / 16735.6
For my 9800GX2, timing results are:
Host to Device Bandwidth for Pageable/Pinned memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1123.7 / 1166.1
Device to Host Bandwidth for Pageable/Pinned memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2142.1 / 5212.8
Device to Device Bandwidth
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 51207.7 / 51210.8
Why is there virtually no performance improvement using pinned memory for hostToDevice memcpy? Is it related to the fact that the 9800GX2 cards are in fact 2 GPUs?
N.