I’m a student with understanding both of parallel programming and advanced numeric topics who would like to start programming with CUDA and Matlab under a linux environment. I’ll be working along with one of my professor, since i think that i might make CUDA (and more generally GPGPU) the subject of my thesis. Therefore i’ll need a mobile CUDA environment to bring with me. From my scarce understanding of IT, i deduce i have three possible ways:
A notebook with a geforce mobile. Unfortunately, very few laptop has a geforce card, and almost all are really heavy and with little autonomy.
A notebook without a geforce mobile, but with a dock capable of hosting a PCI card, like lenovo’s advanced dock. It’s really practical, since i could write the code, and then compile it on a desktop-dedicated gpu.
A desktop with a geforce card, and with a remote control like VNC. About this i’m worried about latency and every thing else… is this a practical way?
I would get a development machine with a G200 card, and use Nomachine (NX) to remote-desktop onto it. Then, I’d try to get a notebook with a CUDA-capable card as well, but not sweat it too much.
Nomachine is wonderful - you won’t go wrong with it. (Unless you need to do heavy duty OpenGL display stuff - that’s more tricky.)
My POV on CUDA laptops: The Apple Macbook Pro line is nice, and the small Zepto computers are interesting. Light, 14" monitor, nice GPU.
i was thinking about buying now a gt8800 for 93 bucks (it’s a deal at a local chain of pcshop) rather than being an early adopter… and when the technology will be more mature i would buy a SLI of the new GPU…
Does the GT200 chipset have any advantage for cuda, apart being faster? (like libraries, or SDK…)
Although i know very well C and MPI i think i will still take me 6 months to learn cuda, and another 3 months (at least) to rewrite the mathematical libraries i have written for MPI… so in 9 months i could buy a SLI of GT200 for nothing…
CUDA does not support SLI, you program each GPU separately.
GT200 architecture has new:
double amount of registers. This in turn often leads to more than double the performance, but more importantly, more complex algorithms are possible
double precision support. Some algorithms need double precision in some steps. So algorithms like that are not possible on older cards.
shared atomics
warp voting
For me personally, the most important is the double amount of registers, as some of my kernels have 70+ registers needed. I had a kernel where I needed to take the sinh() of a large number, which overflowed in single precision, but not in double precision. But I managed to rewrite the algorithm to not need the sinh. Otherwise I would have needed to use double precision.
Yes - most of all, the memory controller is way different. There used to be a whole headache in how you read and wrote from memory in the kernels, it had to be “coalesced” - each memory address had to be read by the correct thread.
If it wasn’t, performance tanked.
It’s a sizable brick of a concept to hold in your head while thinking about everything else that is CUDA.
I promise, G200 is worth it.
–
To rephrase - The G200 is not just a faster CUDA chip - it’s more flexible. It will change the way you work, for the better. You’ll be able to express yourself better in code. That it also runs faster is a nice bonus.
thank you for the precious infos… choosing the right hardware now it’s really important…
now i’m thinking about such a pc:
intel p35 chipset (on abit motherboard)
Quad Core Q6600 (or Q9300, probably the former)
4 gb of ram (DDR2/DDR3? i dunno yet)
GTX 260
some crapty hard disk, ramdisk to be used
Around 800 bucks… Any more advices?
How’s the support of the GTX 260 under linux? I’m afraid that when i’ll buy the system i’ll find something bad… like that the nvidia driver don’t work well under Linux x86_64 <img src=‘http://hqnveipbwb20/public/style_emoticons/<#EMO_DIR#>/crying.gif’ class=‘bbc_emoticon’ alt=‘:’(’ />
I currently run CUDA 2.0 with a GTX 280 + GT200 prototype board on RHEL5 x86_64 in one system, and a pair of 8800 GTX cards in another system. (I’m using AMD hardware, so I can’t comment on Intel motherboards.) Just make sure you install one of the supported Linux distributions listed on the CUDA download page.
Get a p43 or p45 chipset - the difference is mostly PCI-Express 2.0 support, which means a hell of a lot faster transfers to/from the CUDA card. If P43/P45 boards are more expensive and you’re really REALLY short on money, I’d take a dual-core instead of the quad. P43 boards shouldn’t be very expensive right now, though.
(If you’re very lucky, you can maybe find a nice used X38 Xeon server board for cheap … that would work nicely, too, with PCI-E 2.0.)
Edit - actually, may I suggest you consider getting a G43 or G45 chipset with onboard video? If it costs the same, it may be worth it. That way you can use the CUDA card only for CUDA calculation – Running an X server etc. on the CUDA card slows things down a little bit.
I’m running Linux x86_64 and it’s fine. I still only have an 8800, but AFAIK the G200 cards work just as well under Linux.
Edit 2 - Be careful when choosing the power supply :) Total wattage is not completely enough, the right rails have to be able to supply the right amperage, I believe. Search around the net / these forums.
The power management on the card should lower the power draw automatically when you aren’t running GPU code. I haven’t checked this myself with a power meter, but reviewers have reported this feature.
Damn, how could I forget that… I think I am already used to it :D
And indeed, an integrated GPU + GTX260 looks to be an optimal option when running Linux.
When running windows, you need to make sure they are both NVIDIA. In Vista, you cannot get around the fact that the OS will steal memory from your card as I understand it.
What do you mean? You still have to think about coallescing on GT200… There’s just now a new class of “partly coallesced” accesses. On the flipside, the profiler doesn’t work and won’t tell you if you’re coallescing or not.
The doubled registers are nice, but you can run the same complex algorithms on older cards.
PCIe 2.0 is hardly important. Only some people even need it, and even if it turns out you do, again, it’s just a speedup.
The atomics features can, however, be important if you need them. They’re, for one, a feature and not just a speedup.
Now… I think the gtx260 is cheap and should be bought, but let’s not get too excited here. Eg, the expensive mobo can be done without. Do get a good PSU (but watch out for the many over-priced PSUs). In fact, depending on how much money you have and how long it’ll take you to complete your project, you may have the right idea to buy a cheap development rig now and invest in a performance-testing rig once everything’s done. NVIDIA will release new GT200 GPUs pretty soon, in fact.
OK. Can be pushed to 3.0 GHz (333x9) and hardly breaks a sweat.
Have a look to Q9450 too: 2.5 GHz Penryn quad core, has 12MB of 2L cache and hardware idiv logic. A fast beast.
Ok. If you push the Q6600 to 3.0, you’ll want to run the memory at 667 to get synchro FSB/RAM. You should find the OCZ Platinum 1066 specimens at about the same price point you quote (got them from “RomaCC” in Rome).
I’d go higher here, if possible. 550W is about the minimum to reliably run a G92/G200.
Of course you can swap the 550W for a 650W when you’ll buy a 260 in 6 months, but why the double purchase?
Mmm, I recommend getting a faster card here.
You may find a 9800GT 1GB for 130 euro and it’s much faster.
Plus 1 GB of local memory is very nice when running CUDA: in CUDA you want to keep data on the device as much as possible, since feeding bytes through the PCIe is so slooow.
Guys i have to ship a feasibility report by monday… and still i dunno the hardware list <img src=‘http://hqnveipbwb20/public/style_emoticons/<#EMO_DIR#>/crying.gif’ class=‘bbc_emoticon’ alt=‘:’(’ />
Don’t worry about it, just buy it. You can do this endlessly.
I’d say though don’t buy OCZ ram, the brand is a waste of money, even if you overclock. (I don’t know how this overpriced ram business still exists. Years ago cpu/fsb couldn’t be overclocked without pushing the ram hard, and there was a need for the fastest chips. Now… wtf.)
For the same money, get a PSU with a lot more watts. Don’t pay more, though.
BTW, although I couldn’t find anything more about the g45 compatibility issue (a couple mentions, but it doesn’t seem to affect most people), I have to warn you that if you get an Intel integrated-graphics mobo, you won’t be able to use the graphics. Windows only supports a single video driver at a time (mostly). You could get NVIDIA integrated gfx.