Bandwidth Test uses full cpu ?

I’m continuing to do some work with CUDA and some math algorithms. I’ve noticed that when I run the bandwidth test it uses a full cpu on this machine for the entire duration of the test.

Machine is a Pentium D 940 (3.2Ghz), with DDR667 (3 3 3 12) timing.
8800 Ultra in an x8 slot

Is this expected? Does this also say that if I had more cpu available I might get higher results?

(I average 1.4GB/s max on htod and 1.5GB/x max on dtoh)



Your bandwidth numbers look fine. As far as anyone can tell, CUDA spin-waits while waiting for the GPU to finish a task, explaining the 100% CPU usage. If you want higher bandwidth numbers, activate the pinned memory mode.

Unfortunately, that was with pinned memory… I gave up on using straight malloc() after 5 minutes of messing with this stuff.

Well, there are a lot of other factors that contribute to transfer performance. What data buffer size are you using? Is the mainboard connecting to the graphics card in full x16 pci express mode? Do you have a 2nd graphics card? Some mainboards with 2 pci express slots change over to 2 x8 slots when you put 2 cards in.

For comparison purposes, here is the output of the bandwidth test on my machine (linux amd64) running the bandwidth test in shmoo mode. IIRC, the numbers for the big buffers get a little closer to 3GB/s under windows.

-As I stated the card is in a slot using x8 signaling (there’s also 2 7900 GTS cards in the machine)

The numbers I quoted are from the largest buffers using shmoo mode.

My main question was, in an x8 slot is 1.5GB/s the best I can hope for, and is it normal for a cpu to be at 100%

Yes I would reckon 1.5 GB/s is as good as you could hope for. x16 can manage up to a theoretical max of 4 GB/s so to be honest 1.5 GB/s sounds pretty good to me for a x8 card.


Sorry, I missed the 8x slot mentioned in the original post. 1.5GB/s is as fast as you can expect in that configuration.