Newbie question how to do some tasks

marburgcedric · October 24, 2007, 12:11am

OK for start i’ve just heard about CUDA and thought it would be great tool to do some big computation.
So what’s my problem. I got a machine that I want to teach to travel fast through some specific types of mazes.
Algorithm is as follows. It produces several paths machine would like to choose, then goes through them doing some little computation and returning position + some value (like summarized distance or whatso). Normally i checked every single path using CPU, but it seems to be easily paralellized.
My idea is to use every single one (or two) multiprocessor in 8800gtx to check one path given as input on each of mazes.
Therfore I ask, is it possible to send one big movement array to all processors in multiproc as instant and then compute position after moves in different mazes and return vector of 8 locations + values. Set of moves is about 1-10MB for each path (depending on how good precision would i like to get).
I’m just asking cause i’m fresh in GPUcomputing and don’t know if it’s worth effort.

MisterAnderson42 · October 24, 2007, 2:43am

All multiprocessors access the same global device memory, so it is easy to “send one big movement array to all processors”.

When thinking of a way to parallelize some algorithm, I prefer a top down approach. Don’t worry about multiprocessors to start with. Just think about what each individual block (and thread in a block) is going to accomplish. Then start breaking it down into more specific details, like use of shared memory, device memory coalescing, etc…

An 8800GTX has 768MiB of memory, most all of which is usable, so base your design on that.

marburgcedric · October 24, 2007, 10:12am

Also is it possible for processors to get data depending on their (processors) number. That way i could write algorithms I’m learning at Parallel Algorithms Analysis ;-).

One more question, i’ve also read somewhere here that the GPU is limited to 5 seconds constant work or sth like. Can that timing be unblocked? And if not should the tasking for my problem should look like that: CPUs counts new set of movements and launches the computation function on the GPU, then GPU computes within 5 seconds, and returns results.
Am I right ?

MisterAnderson42 · October 24, 2007, 12:50pm

To answer your first question… it depends. CUDA has a “block” execution model. You break your computation up into N indpendent blocks (N > 100 makes the most optimal use of the device). The GPU then interleaves block execution on all of the multiprocessors. Each block (and each thread in the block) knows its id.

Know, if you want to know the PHYSICAL processor a block is running on for some purpose, I can’t help you. Search the forums, there was some experimentation with undocumented .ptx a while ago on this topic.

You can avoid the 5 sec limitation by running in the linux console mode (no X). And yes, the 5 sec watchdog only applies to a single kernel invocation. GPUs are FAST, so in my experience it is hard to reach this limit. But then, my application requires the use of lots of short kernels in an iterative fashion so my experience is biased.

If you haven’t already, you should read the first few chapters of the guide. They will answer all of the questions you have about the programming and execution model, and more.

wildcat4096 · October 24, 2007, 4:13pm

Is your problem a maze solving problem that might normally be solved on a CPU with backtraking or instead is it similar to a traveling salesperson type problem? Keep in mind the consequence of warp divergence when designing your solution. This might affect how you want to order the testing of your paths.

marburgcedric · October 24, 2007, 4:48pm

nope i just need to compute a simple function online during going through the maze.

Acctually the problem is a student project and we want to teach a robot to traverse some mazes, we have them mapped well, so there is no need to allow robot to walk. Instead we would like to simulate possible algorithms and to combine some of them. And that’s the part for GPU. In fact the problem is a bit simplier that it seems (sic).