Is this possible understanding CUDA

ChA_ter_Maat · May 16, 2008, 2:05pm

I have i question. I have A SLI mobo and got a 8800GTS 512 (g92) now i was wonderering. Can I put A 8600GT beside it just for the CUDA calcs. And bridge it to the 8800 GTS. Or do I only use my 8800 GTS install the CUDA driver (vista x64)

MisterAnderson42 · May 16, 2008, 2:22pm

The G92 8800 GTS will be much faster at CUDA than an 8600GT, so why bother? You can run CUDA and display on the same GPU without any problems.

If you add a 2nd card, you don’t need the SLI bridge to use CUDA on both. In fact, if you enable SLI in the drivers, CUDA can only use one of the cards.

FullyArticulate · May 17, 2008, 8:22am

Isn’t there a 5 second limitation for CUDA on a GPU that’s running a display under Windows?

ChA_ter_Maat · May 17, 2008, 9:59am

What i mean is Could the 6800GT Card be used only for the cuda calc. (like a phisycs card)

ChA_ter_Maat · May 17, 2008, 10:00am

THx

E.D_Riedijk · May 17, 2008, 5:30pm

Only G80 and later do CUDA.

MisterAnderson42 · May 19, 2008, 2:23pm

The 5 second watchdog rarely comes into play. Most kernels complete in milliseconds, although this is application dependent of course.

FullyArticulate · May 19, 2008, 7:37pm

“rarely” seems a bit extreme. Some of my kernels may not complete for several minutes. The 5-second rule is a critical limitation to be aware of, IMHO.

An interesting question: what is the overhead of calling a kernel in a loop versus having the kernel loop on its own?

for (i = 0; i < 1000; i++) {

   Kernel<256,128>(values);

}

vs.

Kernel<256,128>(values);

__global__ void Kernel(int values)

{

  for (int i = 0; i < 1000; i++) { ... }

}

E.D_Riedijk · May 19, 2008, 8:11pm

depends a bit if you are just crossing a register-usage boundary that makes your occupancy go down. a kernel call has some overhead, so normally it is wise to loop in your kernel.

MisterAnderson42 · May 19, 2008, 9:19pm

As I said, it is application dependent :) My own applications involves calling short millisecond kernels millions of times, so I am biased the other way. Still, any memory bound kernel can read/write device memory hundreds of times in 5s. My experiences on these forums is that most calls seem to be memory bound, hence my “rarely” comment.

If you can loop inside your kernel, it will result speedup over making many kernel calls. The kernel call overhead is something like ~20 microseconds per call, maybe a little more. But more importantly, your kernel won’t need to dump to global memory after every iteration saving you lots of global memory transfers which should boost performance significantly.

kristleifur · May 20, 2008, 2:16pm

I agree, and just to chime in -

If my feel for things is right, then there’s not only the latency to consider, but also the utilisation of multiprocessors. Exaggerated example: If you run 15 blocks a 100 times on a 16-multiprocessor card, I think you stand a chance of “wasting” one multiprocessor 100 times. It’s not cut-and-dried, but I think I saw this behavior in my app.

Topic		Replies	Views
Low Cost CUDA Development Cards Entry Level Cards for CUDA Program Dev. CUDA Programming and Performance	7	12522	November 18, 2007
5 seconds limitation? or a bug in my kernel? CUDA Programming and Performance	2	2329	October 17, 2007
New System Question CUDA Programming and Performance	6	6136	December 4, 2007
Limitations of a CUDA kernel reached? CUDA Programming and Performance	3	4325	March 7, 2011
Is my kernel too simple to get a speed increase from CUDA? CUDA Programming and Performance	18	3824	February 2, 2010
Raw speed for CUDA apps What is the fastest card at present? CUDA Programming and Performance	7	8836	February 6, 2008
8800 vs 8600: CUDA differences? CUDA Programming and Performance	22	48326	May 23, 2007
GTX or GTS ? CUDA Programming and Performance	14	16168	August 14, 2007
CUDA on integrated graphics? CUDA Programming and Performance	9	6679	October 7, 2007
Time measurement, callbacks, and IPC CUDA Programming and Performance	7	18554	July 17, 2007

Is this possible understanding CUDA

Related topics