Bandwith Device to Device - FAQ and reality why is it slower?

Knaxkopp · May 23, 2007, 9:19am

Hi,

just read the new CUDA FAQ and was very surprised about bandwith test results.

Example measured numbers for a Core 2 Duo processor, 

ASUS P5N32-SLI motherboard with 1GB memory

and a GeForce 8800 GTX are:

                      Pageable     Page-locked

Host - Device   1.7 GB/sec   3.1 GB/sec

Device - Host   1.7 GB/sec   3.1 GB/sec

Device - Device 70.7 GB/sec  70.7 GB/sec

On my office system with 8800 GTX bandwithTest --dtod result is 9.4 GB/sec.

(2x Xeon 3.6 Ghz, Intel E7525 Chipset, 8GB, WinXP pro).

Big difference to the FAQ results.

At home on 8800GTS/640, C2D E6400, ASUS P5LD2-C (i945P), 2GB:

~ 3.5 GB/sec on Linux (it’s a not supported Ubuntu 6.10)

~ 7 GB/sec on WinXP Home

So maybe Mark Harris used a newer CudaKit/SDK with much inprovements ?

How about the release candidate of CUDA 1.0 ?

Simon_Green · May 23, 2007, 9:57am

Yes, these results were run on the CUDA 1.0 release (coming soon!).

osiris1 · May 24, 2007, 1:31am

Pardon my ignorance Simon, but aren’t those figures for device-device actually device-shared as the fundamental limit for device-device has to be 1/2 the actual total memory bus bandwidth of 86Gb/sec on 8800GTX. Or are these figures quoted not throughput figures of a copy, here from one device memory address range to another device memory address range?
Thanks, Eric
PS: Any up to date idea on how soon?

prkipfer · May 24, 2007, 11:43am

I think the maximum figure for D->D should be actually the memory bandwidth as I assume the copy is done by the memory controller and does not need to pass data through the multiprocessors.

Peter

osiris1 · May 24, 2007, 11:03pm

The only way the memory controller could do it faster is if memory is banked and it could write one bank while reading another - not a general D-D copy. Unlikely to be anything like this anyway as it is not an operation that is that important so it would not have dedicated hardware. It is not safe to assume anything on the G80!

Eric

Topic		Replies	Views
Low memory bandwidth CUDA Programming and Performance	4	7154	March 10, 2008
Very low device to device bandwidth with bandwidth test example from SDK CUDA Programming and Performance	2	6130	June 21, 2007
Low Device to Device Bandwidth CUDA Programming and Performance	11	3413	May 4, 2009
Host to Device Memroy Bandwidth CUDA Programming and Performance	18	7987	September 12, 2008
Memory bandwidth too high? CUDA Programming and Performance	0	3560	December 4, 2007
device memory bandwidth issues with 177.67 lower then expected CUDA Programming and Performance	7	5366	October 5, 2008
D2D tranfers slow? D2D slower than reported in FAQ CUDA Programming and Performance	7	15411	June 13, 2007
bandwith performance on PCI-E v1 slow? CUDA Programming and Performance	3	874	May 15, 2008
CPU <--> GPU is getting slow ? CUDA Programming and Performance	0	1130	November 6, 2008
Bandwidht Usage CUDA Programming and Performance	16	8894	October 30, 2008

Bandwith Device to Device - FAQ and reality why is it slower?

Related topics