I’m making kind of a simulation that simulate a system by parts, but I’m afraid that I’ve fallen into a deadlock…
Imagine this situation:
- we have 3 System Blocks (S1,S2 and S3). (A block is a black box simulation that can be done without exchange any info with another systems)
They are connected in this fashion:
…|-------S3 (i.e: S3 depends on S1 and S2 outputs)
So S1 and S2 can be parallelized but S3 only can be simulated after S1 and S2 finish their work, because it will use their output.
So what I’m doing is simulating S1 and S2 (on different Thread Blocks) and after that copying the results to Global memory.
My Problem now is signaling S3 to start when S1 and S2 finish their work… but as far as I know this is not an easy task on CUDA architecture because, thread blocks are supposed to be independent… In this case, they are kind of independent when simulating, but they need to be fed by prior simulation blocks that are simulated in parallel.
Could someone help me, giving some advices how can I solve this problem?
Please make me know, if my explanation was not clear enough, sometimes I feel some problems explaining myself in English:)