Hey all. I’ve got a few questions about what factors can impact the speed of GPU data transfer on different machines. I’m working with a GTX 295, and it gets two different data transfer speeds on two different machines, 1000 Mb/s on one and 1900 Mb/s on the other. Both machines are operating on a PCI-e x16 slot, and have similar hardware, with the second machine having a better cpu and more RAM. I’m mainly interested in what factors could account in this bandwidth difference so I can research the problem a little more effectively. Thanks for any help.
Are these pinned (“page-locked”) memory transfers?
It’s the bandwidth test in the SDK that I’m using for testing, so it tests both pinned and standard memory with the same results.
I found that FSB frequency can be a limiting factor on Intel platforms (pre-i7).
Generally, if the PCIe slot is not “faked” or broken, the prime suspect for low HtD/DtH transfers is the host’s effective memory bandwidth. It can be affected by the RAM itself (memory frequency, dual-channel vs single-channel) or by the path between the CPU and memory (mentioned FSB).
1000 MB/sec and 1900 MB/sec seem pretty slow for pinned memory transfer rates. That’s not too crazy for pageable memory, as the CPU and memory speed make a much bigger difference there. To copy data from memory that is not pinned by the OS, the NVIDIA driver has to copy your data buffer in chunks to private memory location that is pinned, and then instruct the GPU to DMA transfer the data over. When that chunk is finished, the process repeats until all of your data is copied. In this case, you want a fast CPU and fast RAM (because data is being copied from one location to another before being sent to the GPU).
When your application uses pinned memory, the performance is usually much better because the GPU can directly grab your data without the overhead of copying it to a private pinned region first. I think the only platform that can copy data to the GPU from pinned and pageable memory at the same rate is the Core i7 with the X58 chipset. (nearly 6 GB/sec)
Thanks for all the input guys, I’m learning quite a bit today :)
I was incorrect, the Nvidia SDK program only runs the test for pageable memory, not pinned, my mistake. So if I understand you correctly then the speed difference can be attributed to the 1900 MB/sec computer having a faster processor? I believe the RAM is the same in each machine, but the better machine has 2 gigs of RAM to the slower machines 1. Would having more RAM also contribute to a situation like this or just the speed of the RAM you do have?
I’m not sure if memory size is a factor, actually. I’ve never compared two systems differing only in memory size.
Run it from a command line and add the “-p” option to test with pinned memory.
You should see a dramatic improvement on a machine with PCI-E 2.0 hardware.