I was just wondering if anyone has encountered some kind of programming application or algorithm where they really wanted/needed to have some synchronization between blocks/multiprocessors? In my current application, I basically have a single block per multiprocessor on my device. However, sometimes I need the outputs of one of these blocks to provide the inputs to another block. In my application, I am basically running until all the blocks converge to a stable state before exiting the kernel, but this means there is an undetermined number of times my blocks will evaluate their inputs.
Since the dependency between blocks, initially I had just used multiple kernel calls (First the lowest set of inputs execute on the device and the kernel exits. The next highest tier then uses the outputs as inputs, etc). However, since I am running to convergence, this may mean a lot of overhead of kernel-calls. So instead, I have basically created a work-queue like structure. When Block A finishes its output, it schedules B, who uses Block-A’s outputs as its inputs. This works using atomic primitives, so long as you dont have more blocks than multiprocessors.
I guess I am interested if anyone else can think of some simple applications/algorithms that need to run to some undetermined number of times to convergence? Basically I would like to see if this kind of scheduling-queue using atomic primitives can be equally benefitial to some application other than my own. Any ideas would be greatly appreciate! Thanks!