This is normal for pageable memory (at least on Linux systems). Try running the bandwidth test with --memory=pinned. You can allocate pinned memory in your application using cudaMallocHost().
More surprising to me is your Device-to-Device bandwidth. We see 70700 MB/sec on our 8800 GTX cards installed in a Linux server, and I would expect that number to be mostly independent of operating system.
Are you using a 8800 GTX or a GT?
The GT has a 256 bit memory controller ( compared to the 378 of the GTX), so the 44GB/s are expected. The device to device bandwidth is not related to Gen1 or Gen2 interface.
BTW peak data transfer on a Gen1 x16 is 3.2 GB/s (due to the PCI-e 8/10 encoding), so with pinned memory you are very close to the limit.
Like Hillary, I misspoke. The card on my old Mac Pro is 8800 GT. Its global memory BW is 57.6 GB/sec. The 8800 GTX card’s global memory BW is 86.4 GB/sec. So the number 44253 MB/sec seems reasonable.