Been having great fun learning to play with CUDA… Thanks NVIDIA for an AMAZING tool!
It takes a switch of gears to see how your new limits are memory patterns and branching structure!
I’ve played with many of the (very impressive) demo projects, and even modifed a few.
I have tons of Newbie questions though, ones that have clues scattered everywhere but aren’t so obvious.
#1 One confusion I have is a HARDWARE question. The “5 second timeout” is alluded to but so very vague. My (super background, number chugging) computes will take DAYS if not WEEKS to run, but it’s reasonably feasible to break that into say 50ms chunks.
But my experiments show that while a kernel runs, your machine is FROZEN. Not just 100% pegged CPU, but the display itself is absolutely unchanging, no mouse pointer sprite movement, nothing. Is this supposed to happen?
I get the feeling it’s correct, since there are references to using “non-display” cards for CUDA, and making sure SLI is off, and so on.
But it just seems so strange that the display will FREEZE. is there no way to let CUDA work “in the background” somehow?
This issue may be because I’m using a laptop… a Thinkpad T61p with Quadro 570M. But it sure makes laptop CUDA terrible, even though everything else works fine.
Question #2
On a desktop machine, is it best practice to do something like use one cheap card for display (no CUDA) and a second (or more) for CUDA? I would assume then that there’s no need to match the card types… I could use my old 7600GT or something for display
Question #3
Since I will likely buy a new CUDA power-card now that I’ve had programming success, is there any hints about the new GT280 card coming out in 2 weeks? I don’t need that much compute power, BUT if the CUDA hardware capability model is updated (say, to support doubles, or 64 bit ints), it would be smart to get the latest capabilities.
CUDA is implemented in a different hardware mode than graphics. So the display cannot be updated while a CUDA kernel is running. And the 5s timeout only refers to a single kernel call. As you said, you can break your problem into smaller chunks to avoid this. My own CUDA work makes 100’s of millions of ~5ms kernel calls over days or weeks.
If your kernels regularly run for long times, than this can be a good invesment. As you say, you can use an older GPU for display only: as long as the latest driver supports both the CUDA GPU and the old GPU.
If you search the forums a bit, there are a few official NVIDIA posts about double support. As to performance or any other details, there has been only a trickle of information. Anyone who knows is still under NDA or at NVIDIA.
Given that the GT2xx is a major generation jump, my advice is to wait and and go for one of those.
Hmm, but this means that a user’s machine will be locked up and useless during the CUDA computes?
Or do you do something like do a 9ms compute, then sleep for 1 ms, and repeat, allowing the display to update at 100 fps (and hoping the 1ms sleep is long enough for the screen updates to be drawn)? Won’t even that case cause screen “stuttering”?
This is a big worry, it would be awkward to ship a commercial tool that locked up the users machine in any noticable way.
well there is no such thing as magic, either use 2 cards, or live with the fact that the screen is not updated during the calculations. Personally I have not noticed any stuttering while I was still using the videocard that I use for display as CUDA device (apart from when I made a mistake and I hit the 5 sec timeout)
No, there is no need to “sleep” for 1 ms. The driver is very good at managing resources between graphics and compute. When I’m running my sims calling ~5 ms kernels constantly the display still updates with virtually no lag. If I start doing something that requires a lot of graphical updates, like dragging a window around the simulation slows to ~50% performance but everything is still running fine.
It is only extremely long kernel calls that start to create visual lag problems on the display.
My sims mostly run on a headless box anyway. The system with the display is just for development.
Thanks for the followups. This is one of those small but important little details that you have to learn from experience, the programming guide skips over stuff like this.
So keep evals to less than 10ms, you’ll probably be OK, makes sense.
The ugly part of course is that this does add algorithm implementations, some stuff may not be “slicable” so nicely. I hope future drivers (or hardware?) has CUDA process time-sharing so we can ignore the limitation in the future.
That is not very likely to happen. How should you do that in reality? You cannot swap out processes from the GPU, you need to keep your state. You could schedule only N blocks per multiprocessor, do display updates, and schedule the next N blocks, but that would be killing for the latency-hiding.
Really for algorithms that have a long running time, you want a secondary card.
I agree, it seems like an extra card is the way to go. For research or development, that’s no problem. For commercial software, it sucks.
So here’s a followup question. If I mis-judge my kernel and take too long (>5 secs) anyway (or mis-code and make an infinite loop) I sometimes hit the timeout limit and my app returns (terminated). But sometimes I just blue-screen crash, with a message that the driver was taking too long. Ugh. Blue screen crashes suck. Are these bluescreen failures a sign of a CUDA driver bug? Or more likely just the new norm… when you push the envelope, you have to expect it to tear occasionally?
Blue screening your computer with CUDA is definitely a bug. If you can write a short example program that reproduces this problem and post it here, along with your system setup (incl. CUDA version), then someone at NVIDIA will take a look.