What will be like for GTX 295

darot · July 23, 2009, 5:46am

I have a GTX 295. And I plan to use the dual cores simulaneoustly.(In fact, I plug 3 GTX 295 to do 6 tasks at once, 8 cpu cores, 24GB host memory)
What will the following cases be like?

When doing memory copy from host to device, will the bandwithd be shared or the copy time will be the same as single core(GTX 285)?
How many sps are in these two cores? 240 sps for each or 120sps for each?
Is it possible to do memory copy between these two core, how to make it?
How to do the sync between these two core?
any other drawback between using one GTX 295 and two GTX 285?

And if only performance is concerned. is threre difference between Qurdo and GTX series board?

seibert · July 23, 2009, 2:46pm

The two CUDA devices share the PCI-Express bandwidth through a switch. If only one of the two devices on the card are transferring, that device can use the full bandwidth like a single device card. If both devices transfer simultaneously, then each device only gets half the bandwidth.

Each CUDA device in a GTX 295 has 240 stream processors.

There is still no mechanism in CUDA to copy data from one GPU directly to another one. You have to copy the data from GPU #1 to the host with host thread #1, then copy from host thread #2 to GPU #2. To make this as fast as possible, you should declare pinned memory to be portable between threads. This requires calling cudaHostAlloc() with the cudaHostAllocPortable flag. (See CUDA 2.2 release notes.)

There is no synchronization between the two devices. They are treated like two completely independent cards that share a PCI-Express slot.

None that I’m aware of.

No, as long as you match the compute capability (1.3 for GTX 200 cards), clock rate, # of stream processors, and memory bandwidth, there should be no performance difference.

darot · July 23, 2009, 11:53pm

Thank you so much to reply my question.

It totally clears my question marks.

The two CUDA devices share the PCI-Express bandwidth through a switch. If only one of the two devices on the card are transferring, that device can use the full bandwidth like a single device card. If both devices transfer simultaneously, then each device only gets half the bandwidth.

Each CUDA device in a GTX 295 has 240 stream processors.

There is still no mechanism in CUDA to copy data from one GPU directly to another one. You have to copy the data from GPU #1 to the host with host thread #1, then copy from host thread #2 to GPU #2. To make this as fast as possible, you should declare pinned memory to be portable between threads. This requires calling cudaHostAlloc() with the cudaHostAllocPortable flag. (See CUDA 2.2 release notes.)

There is no synchronization between the two devices. They are treated like two completely independent cards that share a PCI-Express slot.

None that I’m aware of.

No, as long as you match the compute capability (1.3 for GTX 200 cards), clock rate, # of stream processors, and memory bandwidth, there should be no performance difference.

Topic		Replies	Views
Question about multi-GPU programming Memory accesses and sharing CUDA Programming and Performance	10	7301	January 13, 2009
GTX-295 CUDA Programming and Performance	7	3702	June 12, 2010
GeForce GTX 295 vs. 285 for CUDA development CUDA Programming and Performance	4	8597	August 11, 2009
CUDA 2.1 and GTX295 CUDA Programming and Performance	10	5829	May 9, 2009
Running GTX 295 and Tesla C1060 together? CUDA Programming and Performance	4	16358	July 14, 2009
Using GTX 590 cards for CUDA SLI cards under CUDA? CUDA Programming and Performance	37	14536	April 2, 2012
Using two FX5800's instead of GTX 295 CUDA Programming and Performance	5	2929	May 22, 2009
C1060 VS GTX295 CUDA Programming and Performance	7	8374	April 25, 2009
GeForce GTX 690 - dual and independent DMA engines? CUDA Programming and Performance	5	1493	May 24, 2013
GTX295 question CUDA Programming and Performance	11	10238	May 10, 2009

What will be like for GTX 295

Related topics