I have seen a few threads about this, but I have seen no resolution or explanation (at least that made sense to me) concerning slow Linux Host <-> Device bandwidth vs. Windows.
Both examples use pinned memory and the bandwidth project in the SDK.
On one system, I am running Windows XP with an 8800 GTS board and I am able to achieve transfer rates of about 2GB/Sec (or about 1/2 the theoretical bandwidth of a x16 PCIe bus).
On another system, I am running SUSE with a C870 Tesla board and I am achieving numbers on the order of 750MB/Sec for Host -> Device and 333MB/Sec for Device -> Host transfers.
Now I understand that the motherboard/chipset play a role here, but I would think that regardless, one should be able to achieve bandwidths greater than 1/12 the theoretical limit!!
This is potentialy a deal breaker for using this technology in a released product, so I am very interested in finding the answer and fixing the issue. The board can be lightning fast, but if I can’t move data to/from the board at a reasonable rate (such as the Windows bandwidth), I can’t retire the risk of switching to a new platform.