Host <-> Device bandwidth slow

I have seen a few threads about this, but I have seen no resolution or explanation (at least that made sense to me) concerning slow Linux Host <-> Device bandwidth vs. Windows.

Both examples use pinned memory and the bandwidth project in the SDK.

On one system, I am running Windows XP with an 8800 GTS board and I am able to achieve transfer rates of about 2GB/Sec (or about 1/2 the theoretical bandwidth of a x16 PCIe bus).

On another system, I am running SUSE with a C870 Tesla board and I am achieving numbers on the order of 750MB/Sec for Host -> Device and 333MB/Sec for Device -> Host transfers.

Now I understand that the motherboard/chipset play a role here, but I would think that regardless, one should be able to achieve bandwidths greater than 1/12 the theoretical limit!!

This is potentialy a deal breaker for using this technology in a released product, so I am very interested in finding the answer and fixing the issue. The board can be lightning fast, but if I can’t move data to/from the board at a reasonable rate (such as the Windows bandwidth), I can’t retire the risk of switching to a new platform.

Thank you.

My machine (Dell precision 4xx) dual boots. I get ~3GiB/s in windows and ~2.5 or ~2 GiB/s in linux (both using pinned), so there does seem to be something different about the linux drivers that slows them down a bit.

Judging by other posts on these forums, it seems that some chipsets do have very low transfer speeds, as you see on your SUSE machine. In general, I’m not sure if this is due to poor throughput in parts of the chipset, poor chipset design, or the driver failing to make maximal use of the chipset.

In at least one case, I remember the problem with slow transfers being related to the chipset switching the slot down to 4x (or was it 1x?). I don’t remember how that problem was solved.

“most” systems seem to have decent speeds, so I don’t think it is a deal breaker. But, I don’t have a solution for you either. If you are a registered developer, you can post the full system specs and slow performance you are getting in an NVIDIA bug report. If it turns out to be a driver issue, then you have a chance that it will be fixed in the next release. If you aren’t a registered developer, just post that info here and someone from NVIDA will probably notice.

@cudaprogrammer,
Its not clear from your post whether you’re doing this testing on the same motherboard, or different motherboards.

Can you provide some details on the hardware that you’re using?

Also, please confirm that you’re using the latest motherboard BIOS in all cases.

thanks,
Lonni

MisterAnderson42 and netllama,

 Thank you very much for the reply.  For some further system background, I have the following (using pinned memory):

Using Windows XP in an XPS410 (Dell) box with an Intel Core2 1.86GHz
Transfer Size (Bytes) Bandwidth(MB/s)
Host to Device Bandwidth for Pinned memory
33554432 2614.8
Device to Host Bandwidth for Pinned memory
33554432 2030.9

Using a C870 (Tesla) and OpenSuse 10.2, x86, kernel 2.6.18.2-34-default:
Transfer Size (Bytes) Bandwidth(MB/s)
Host to Device Bandwidth for Pinned memory
33554432 762.5
Device to Host Bandwidth for Pinned memory
33554432 336.3

Using an 8800 GTX with CentOS (Linux):
Transfer Size (Bytes) Bandwidth(MB/s)
Host to Device Bandwidth for Pinned memory
33554432 764.7
Device to Host Bandwidth for Pinned memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 856.5

 Some of the systems are remote servers, but I did ask on the SUSE system and the drivers were updated (to current) with no change to system performance.  

 I am hoping someone who is reading this has an OpenSuse 10.2 system and could verify or refute these numbers.  The Windows numbers are good, so there is no problem there and I am not as concerned about the CentOS system.  I am most concerned with the SUSE system, as it has the worst memory bandwidth performance and the drivers seem to be up-to-date.  I understand it could still be a motherboard chipset performance issue, but if someone could run the bandwidth test from the 1.1 SDK, I would know which direction to pursue (Linux driver or hardware).

Thanks in advance for any help.

It is a chipset problem ( or cards in slots that are not x16).

mfatica,

 Thank you for taking the time to reply.  I am using a C870 Tesla card, so it is a x16 PCIe implementation.  I am wondering how you could say with certainty that the problem is definitely hardware (motherboard, chipset, etc.) without first having someone run the bandwidth test using the OpenSuse 10.2 OS?  It would seem to me that the test needs to be executed on another machine (or machines) before the driver could be ruled out.  Where is my logic incorrect?

There have been other users reporting bandwidth with OpenSuse (10.1 and 10.2), see for example http://forums.nvidia.com/lofiversion/index.php?t31108.html, and the results were in the right ballpark. Plus, if the Linux distribution is on the list of supported ones, it is fully tested internally.

I have tested a lot of machines, there are some differences between Windows and Linux but never of this magnitude.

The C870 is a x16 card, but the slot on your motherboard could be a mechanical x16 that is an electrical x4 or x8.