Do CUDA graph host nodes execute on more than one thread?

Is it possible for two host nodes be executing simultaneously on different threads? Or do they execute in serial on a single thread?

Last time I checked, one thread per GPU is used for host callbacks. This means multiple independent host nodes which depend on streams of the same device could not run in parallel by default.

1 Like