I’m continuing to do some work with CUDA and some math algorithms. I’ve noticed that when I run the bandwidth test it uses a full cpu on this machine for the entire duration of the test.
Machine is a Pentium D 940 (3.2Ghz), with DDR667 (3 3 3 12) timing.
8800 Ultra in an x8 slot
Is this expected? Does this also say that if I had more cpu available I might get higher results?
(I average 1.4GB/s max on htod and 1.5GB/x max on dtoh)
Your bandwidth numbers look fine. As far as anyone can tell, CUDA spin-waits while waiting for the GPU to finish a task, explaining the 100% CPU usage. If you want higher bandwidth numbers, activate the pinned memory mode.
Well, there are a lot of other factors that contribute to transfer performance. What data buffer size are you using? Is the mainboard connecting to the graphics card in full x16 pci express mode? Do you have a 2nd graphics card? Some mainboards with 2 pci express slots change over to 2 x8 slots when you put 2 cards in.
For comparison purposes, here is the output of the bandwidth test on my machine (linux amd64) running the bandwidth test in shmoo mode. IIRC, the numbers for the big buffers get a little closer to 3GB/s under windows.
Yes I would reckon 1.5 GB/s is as good as you could hope for. x16 can manage up to a theoretical max of 4 GB/s so to be honest 1.5 GB/s sounds pretty good to me for a x8 card.