CUDA Graph Traversal

nunez.juan · November 7, 2019, 10:30pm

I am just starting to look into using CUDA graphs.

So far I have not seen text specifically on how a CUDA graph is traversed or on how to build a graph with dynamic traversal.

Looking at this graph:
External Media

Kernel A does some work and, based on the results, I want the graph to execute Kernel B or Kernel C. In either case, there is a “finalize” kernel D.

One type of scenario - packet of data comes in (say 60MB every 200ms). My code does a dev-to-dev memcopy of the data into the graph’s shared memory. I then kick-off the graph execution.

How do I build in that decision on A–>B or A–>C?

Thank you for any help.
Juan

Robert_Crovella · November 8, 2019, 5:12am

One possible approach:

launch both B and C. Have B and C check whether they should run or not, and if not needed just exit.

For example, kernel A sets a device boolean variable (called var) true or false. The code in B does:

if (var){
//body of B kernel code
}

The code in C does:

if (!var){
//body of C kernel code
}

nunez.juan · November 8, 2019, 8:53pm

Hello Robert:

I did think of this, but I was hoping not to have that logic within the kernels but keep it in the encapsulating graph logic or at least in some code at the end of the kernel that could tell the graph which node to run next.

It seems to me, however, that every node in the graph will be executed; if there was an additional node between B and D (say B2), then after directly exiting B (because if var is false), then B2 would be ran as well…

I would want that entire B Branch (sub-tree) to not be traversed.

Thanks
Juan