Does anyone have/know of hieararchy-like CUDA applications?

Hey,

A while back I had asked something similar, but didn’t get any response. Just wondering if anyone out there knows of/has created a CUDA application that fits a hierarchy-like structure that invokes multiple kernel calls (by hierarchy, meaning that there are data dependencies between sequential kernel calls).

Basically, I’m looking for something that works similar to an array-reduction or a multilayer neural network. I want to look at applications that need multiple kernel calls (but use the same, or very similar kernels) between layers, that use the kernel-call itself as a barrier to ensure data dependencies between layers are fullfilled. Basically, if you know what Bulk Synchronous Processing is, that is what I am describing. I want to find any applications that fit this type of design.

Thanks for the help!

Check out all of the work of Kun Zhou. Especially his paper on bulk synchronous processing on NVidia GPUs. He’s incredibly talented (and works with other very talented people!) and has implemented many key applications (REYES, photon mapping, spatial sorting) in a bulk synchronous GPU framework. His work is arguably shares the most advanced GPU applications to date… other apps likely exist but not as published, public, documented projects.