A while back I had asked something similar, but didn’t get any response. Just wondering if anyone out there knows of/has created a CUDA application that fits a hierarchy-like structure that invokes multiple kernel calls (by hierarchy, meaning that there are data dependencies between sequential kernel calls).
Basically, I’m looking for something that works similar to an array-reduction or a multilayer neural network. I want to look at applications that need multiple kernel calls (but use the same, or very similar kernels) between layers, that use the kernel-call itself as a barrier to ensure data dependencies between layers are fullfilled. Basically, if you know what Bulk Synchronous Processing is, that is what I am describing. I want to find any applications that fit this type of design.
Thanks for the help!