Hello All - I am preparing to begin a series of projects using CUDA. At work, I will be using the Tesla C870. I am also going to purchase an entry-level card so I can experiment at home.
After a quick survey of the market, I am considering the EVGA GeForce 8600 GTS with 512mb Ram, 128-bit memory, 32Gb/sec BW, 675 MHz clock, and 32 stream processors. Based on what I’ve read, it was very favorably reviewed.
I realize that this doesn’t have 8800 performance, but for $159 it seems like a hard deal to beat.
Does anyone have any comments / thoughts on using the 8600 series with CUDA for basic learning and experimentation? Are there any know compatability issues with CUDA on this series of cards? Lastly, how difficult will it be to “scale” code from the 8600 to Telsa (32 stream cpus vs. 128 stream cpus), and vice versa.
On another note, I enjoyed the CUDA Tutorial at SC07, and it was a very helpful start.
8600 ought to work for development just fine. Scaling up to more stream processors is not really an issue, as threadblocks get dynamically scheduled among the multiprocessors. So, that means that your program will run much faster on a C870, when compared to an 8600.
One thing to keep in mind is that 8600 has compute capability 1.1, while C870 does not. So, for example, you’ll be able to use atomics on an 8600, but not on C870.
Indeed, 8600 GTS is a very good start. We’ve used it for initial development of our CUDA-enabled software.
However, do not expect your kernels to scale up linearly with number of multiprocessors/frequency. Host-to-GPU memory bandwidth may easily become a bottleneck, so design your kernels to have as few CPU-GPU interactions as possible.
And as Paulius said, do not use atomics or other CC1.1 features in your kernels because Tesla is based on G80 which is CC1.0.
After scanning the readme file on the NVIDIA download page, it makes mention of a known issue when using a GPU both as a compute device, and the primary windows display. The issue references a 5 second time limit.
So, if I use the 8600GT, should I refrain from using it for my primary video card? Has this issue been resolved, or is it being fixed in 1.1? I guess I am a bit perplexed, because I saw several people at SC07 demoing CUDA applications on their laptops - which would only have one video card…
Also, will emulation mode work on computers which do not have a compatible NVIDIA device? We have several people at work who are interested in learning the syntax, complier, etc…, but have yet to get hardware procured for real “online” development.
The 5 second time limit is a Windows XP “feature” (the OS thinks that something bad happened to the device if it does not respond for 5 seconds).
5 seconds is a long time for a kernel ( as you noticed, we were running demos on laptops with a single graphic card at SC07), I would not worry about it.
You do not need a CUDA capable card to compile or run in emulation mode.