pinned memory performance on 9800GX2 No host2device memcpy performance improvement using pinned memo

Hi all,

I’m experiencing some weird timing results for memcpy operations on the 9800GX2 using the bandwidthTest executable:

For my Quadro FX 1600M, timing results for memory are:

Host to Device Bandwidth for Pageable/Pinned memory

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 1061.6 / 2253.5

Device to Host Bandwidth for Pageable/Pinned memory

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 1105.6 / 1626.5

Device to Device Bandwidth

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 16725.9 / 16735.6

For my 9800GX2, timing results are:

Host to Device Bandwidth for Pageable/Pinned memory

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 1123.7 / 1166.1

Device to Host Bandwidth for Pageable/Pinned memory

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 2142.1 / 5212.8

Device to Device Bandwidth

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 51207.7 / 51210.8

Why is there virtually no performance improvement using pinned memory for hostToDevice memcpy? Is it related to the fact that the 9800GX2 cards are in fact 2 GPUs?

N.

Not sure what the answer is here, but I know for the GTX 295, if you run bandwidthTest on one of the two GPUs on the card, you see the full PCI-Express 2.0 bandwidth. I assumed the 9800 GX2 would be similar, but maybe not.

Thanks for the reply seibert. It’s good to know that the problem isn’t present in the current line of GPUs which are the target platform for the code I’m writing.

N.