Global thread barrier

Many in the forums have tried and posted their results.

No, it works horribly. As an example of the simplest problem encountered: not all blocks will run concurrently. Therefore, your global barrier will deadlock.

Yes. Just use multiple kernel invocations. Is 10 microseconds that bad an overhead to pay?

1 Like