CUDA swap on device

I have a code I want to port to CUDA graph, but I have the problem that I have 2 worklists which I use interleaved, and I need a method to use either one or the other.

One approach I think would to have a WorkList**. Or an WorkList index.

Neither approach seems to be working for me…

As other point the implementation seems plausible in the sense that I can have a while conditional node and within its subgraph it can have a nested conditional node.

And seems that I will need 1 conditonal handle per while and if.

So the problem will remain in an efficient way to indicate which worklist to use in each graph…

Since the two buffers are swapped in each iteration, you know whichone is the input buffer based on the loop iteration.
This means you can conditionally launch a kernel with input A or input B.

An alternative could be that you manually unroll the loop to have two passes in the while loop. This way you can ensure that an iteration of the while loop always starts with the same input buffer

Mmmmmm I see another disadvantage, I cannot dynamically set the launch kernel parameters. Such as the number of blocks/threads, and shmem amount…

For example, if you use the API method for specifying the work, you can update graph nodes to vary these things.

1 Like

Yes, that is possible. But not if I record in the Graph the while node. And within that node the conditional If.

So, I would be limited to perform the while on the host. And record/update the body and then launch.

You can also make the distinction within the kernel - just as another option.
Give both as parameters.

you can also construct the nested graphs by hand. You do not need to use stream capture