im considering buying a GTS 250. but before i do i want to be clear on a threads question. the CUDA manual says it has 16 multiprocessors (128 processors).
so does this mean that in a CUDA program there can be upto 512x128 thread running simultaneously. or is threads simply a software concept and only 128 threads running simultaneously? (SIMULTANEOUS being my key question)
Actually, more than 128 threads will be run simultianuously, since the processors are highly hyperthreaded to hide latency. When a thread is stalled with a memory access, for example, the processor simply executes a different thread. Each core switches to a different thread every 4 clock cycles (which works out to 1 instruction across a warp), choosing among a number of threads defined by the occupancy of the kernel - that is, the fewer registers and less shared memory a kernel uses, the more threads the processor can switch between at a time, thus the better it can hide latency. Consider 96 threads per multiprocessor a bare minimum for decent performance.
Yeah, it all depends on your definition of “simultaneous”. :)
The multiprocessors are designed to switch rapidly between threads without the usual context switch required on a CPU. (When you have 16,384 registers at your disposal, you can do stuff like that.) And you’ll have instructions from nearly all your threads at some stage in the pipeline of one of the stream processors.