Starting out with CUDA

Hi All,

Hope I’m okay starting off by asking some simple questions.

Firstly my system currently uses an onboard Geforce GPU (7050PV) which I assume is not capable of being utilized by CUDA.

The next question is about the different capabilities between the current flock of Nvidia GPUs. I was looking at the GTX260 as it seems reasonable at about £125. What benefit is their (bar slightly more processing cores) from getting a GTX280 over a 260 for CUDA work?

Next I’m confused by the tesla series, they seem almost identical to the standard GTX280 cards with 4GB of VRAM as opposed to the typical ~1GB on the GTX280 cards. How imparticular are tesla cards “better” or CUDA work than GTX280 cards?

Next (sorry for so many questions) I’ve briefly looked at the difference between the 8800 series and the GT200 series cards from a CUDA perspective - I see that “double precision” is the key difference, could some flesh out an example of there this would be particularly useful.

Many thanks in advance,

Dstat :wacko:

Hi,

I’d recommend the GTX 260 or upwards, and the cheapest you can find. The differences between GTX 260 and 280 when you’re starting development aren’t that big, only as you said, slightly more processing cores. They behave identically.

The biggest difference by far with the Tesla stuff is the memory. You won’t need one - when / if you do need one, you’ll know. :)

The biggest difference between the 8800 and GT200 is IMO not the floating point support, but rather a much improved memory controller. The effect is that you don’t have to be NEARLY as careful about how you access memory with the newer cards, making them much easier to program for! (There’s also a few very nice differences. I wouldn’t buy anything less than a GTX260 unless I was very broke, in which case I’d go for an 8800 GT or its ancestors.)

Note that the GTS-250 does not have the nice memory controller properties as the GTX-260 and above.

In summary: You are very much on the right track. At least in how it matches with my experience. Have fun!

That’s really useful thank you. I think the GTX260 will be the way to go then in that case. I’m interested to hear your comments on the memory controller of the GT200 series cards. I’ve only briefly looked over how CUDA copies memory from the host to the card’s memory and it looked pretty straight forward though I’m probably missing something / a lot :)

Thanks again,

Dstat

Hi! Got a card yet?

Yeah, it’s a lot of small details that add up. The memory controller thing I was talking about is the in the CUDA card, that loads from the main video RAM banks to the threads in the shader processors. In the old cards, each thread has to read a specific address pattern to get full speed, but in the new cards, the threads only have to read adjacent addresses. It’s called coalescing, you may have encountered it. If not you will soon :)

(Just to make sure, the new cards ALSO have a new zero-copy feature that does have to do with host-to-card memory transfers. Lotsa stuff!)

Yep! In the end I’ve got a 9800GT card (I wanted the GTX260 like I posted but they had none in stock :( and the 9800GT was on offer). I’ve just downloaded the driver, SDK and samples so hopefully I’ll be “CUDA’ing” by later tonight. I’m a little worried that Debian isn’t one of the supported Linux distros for CUDA, I’ve dowloaded the files for Ubuntu in the hope that as Ubuntu is Debian based it’ll work on Debian too :)

Think I’ll start out slow by following the examples that are provided :)

EDIT Okay I’ve installed everything but the ‘make’ in the SDK samples folder didn’t work but I could compile individual samples. I think it’s all up and running correctly. The bandwidth test gave a figure of cira 2.4GB/s on device ↔ host memory transfer. I’m loving the fact that one of the samples is a 1D Discrete Wavelet Decomposition with the Haar wavelet, but the size of the data file is tiny :( don’t happen to know if I throw a .dat file in the with a stupidly large number of values in it (say something over 2 million entries but still satisfying the 2^n rule) whether the code will still work?

Loving CUDA so far though ! :)