Bandwidth Results for GeForce 8800 GTX on old Mac Clarifying Device to Host BW Results

I recently installed the 8800 GTX card on my old Mac Pro (has PCIe Gen1 connector).

The “bandwidthTest” compiled from the toolkit gave the following result

Quick Mode
Host to Device Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1489.4

Quick Mode
Device to Host Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1222.8

Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 44425.9

&&&& Test PASSED

Since the old Mac Pro is x16 Gen1, I was expecting close to 4 GB/sec bandwidth. I only see around 1.2GB/sec to 1.5GB/sec.

I was wondering if you have observed the same results.

This is normal for pageable memory (at least on Linux systems). Try running the bandwidth test with --memory=pinned. You can allocate pinned memory in your application using cudaMallocHost().

More surprising to me is your Device-to-Device bandwidth. We see 70700 MB/sec on our 8800 GTX cards installed in a Linux server, and I would expect that number to be mostly independent of operating system.

Thanks. That explains it. Tried with --memory=pinned that you suggested, now I get 2.5GB/sec to 3.1GB/sec bandwidth. Seems reasonable.

I am still getting 44253 MB/sec. I need to investigate this. Should be independent of whether the card has PCIe Gen1 or Gen2 interface.

Are you using a 8800 GTX or a GT?
The GT has a 256 bit memory controller ( compared to the 378 of the GTX), so the 44GB/s are expected. The device to device bandwidth is not related to Gen1 or Gen2 interface.

BTW peak data transfer on a Gen1 x16 is 3.2 GB/s (due to the PCI-e 8/10 encoding), so with pinned memory you are very close to the limit.

Like Hillary, I misspoke. The card on my old Mac Pro is 8800 GT. Its global memory BW is 57.6 GB/sec. The 8800 GTX card’s global memory BW is 86.4 GB/sec. So the number 44253 MB/sec seems reasonable.