I’ve been measuring (both in my application, and with bandwidthTest included in CUDA SDK) the transfer speed (Device to Host) of my PCI Express 1.0 x4 port.
I was surprised to see it was limited to 250MB/s, that is transfer speed for PCI Express 1.0 x1.
What does it mean ? Does x4 stands for “4x250MB/s transfer can be done simultaneously” ?
If so, can I cheat the system, asking to transfer my big data set as 4 simultaneous (asynchronous) transfer, each one carrying a block of the big set ?
Well, my BIOS and NVIDIA Driver recognize it as a x4 bus, and I have far higher transfer speed Host to Device:
according to bandwidthTest, it can transfer up to 680MB/s.
So I don’t understand why it is limited to 250MB/s while transfering Device to Host…
BUS PCIe2.0 x16
VGA Card = NVIDIA 9800GTX+
CPU = Intel Core i7 920 (Quadcore with 2,8GHz)
Memory = DDR3 1600MHz Tripple Channel Memory
operation system = Windows XP 32Bit
Memcopy from PC to GPU = 5200MByte/second
Memcopy from GPU to PC = 4700MByte/second
Is a 64Bit operation System (Vista 64Bit) better as a Windows XP 32Bit?
Get I more bandwidth with a NVIDIA 285GTX?