Return value

Hi,

I am programming a parallel graph search algorithm (Dijkstra for first)

that may be following structure:

while(structureHasNodes) {

getNextNodeFromStructure<<<...>>>();

doThisNodeByTraversingItsEdges<<<>>>();

addItsChildNodesToStructure<<<>>>()

}

Before anyone says: “there is no sense in traversing the edges in cuda, because its so fast, that overhead would be too big”: the weight of the edges is not constant and has to be calculated.

So my Question is: is there a possibility to get a return value from a kernel, without using memcopy from device to host?

Some kind of event or signal would be enough (I just need to know “yes” or “no”, so I know, when to finish the loop).

Or can a kernel call an other kernel (with other block and grid dimensions)?

Kind regards and thanks in advance

without using memcopy from device to host?
Possibly zerocopy could be of help? I haven’t gotten to using myself yet so I don’t know a lot, but have a look.

kernel call an other kernel
I believe that cannot be done.

Thx for the reply.

But it seems to work only for MCP79 and GT200.

So i will copy memory ^^ (hopefully 1 byte will not be so slow ;-) )

Kind regards

Look at it this way… you want to have access to something that was computed on your device, from your host.
You need to copy it over at some point or another.

With a 1 byte transfer, you will be limited by the memcpy overhead.

As for kernels calling other kernels… a global function cannot call another gloabal function… yet! This may (will?) change with Fermi.

But an argument against that is:

I can use parameters without copying to device, so why not getting return values from device.

And it need not to be a value, a signal like an interrupt would be enough … so I think my question was not that stupid.

Oh i never said it was stupid, sorry if it came out that way.
One problem i see with this is… which of the thread will emit the signal? When you pass parameters to the device, the same set goes to all active threads.

Maybe some other people have found a better way to do this but, for now, I think the only message the whole kernel can return is “im done”. Id be happy if someone came with a better solution, it would serve me as well!