Scheduling The pain of global synchronization...

Hi!

Will a work group run to completion or can it be interrupted to allow another work group to execute?

I’m trying to synchronize globally due to the large overhead of a kernel launch compared to my kernel execution time, and my method is to atomistically increment a counter in each thread and then wait for a flag. When the last thread increments, it sets a flags such that the rest can continue execution. It seems to work if I have a maximum of 4 work groups on my 8600GT, so it seems that work groups run until completely finished before another group is scheduled, since the 8600GT has 4 multiprocessors?

Best Regards,
Madsen