Correctly exiting persistent threads with global/local work queues and self generating work

I’m unsure how to approach the problem of shutting down persistent threads safely. In the case where each thread has its own local work queue, where overflows are spilled to a global work queue guarded by a lock, and exhausted local work queues cause work to be stolen from the global queue. And where each thread in processing work will generate zero to some set number of work items (Tree traversal type stuff for example).

Papers I have come across that appear to touch on this don’t have any specific details how to do this. I have only seen cases where an upper bound on total work is statically decided which of course may cause waste and is not easily determined in most cases.

The only other guaranteed way I can think of is to use a clock based bail out if idle period which isn’t ideal either and will cause waste.

Any other attempt of mine so far to use atomic based schemes of protocols/messaging has major flaws.

How about the following?

  1. keep in the global atomic var total number of currently active threads (i.e. processing some data)
  2. while there is no work available, check ActiveThreads each 1 ms or so. Exit if it reached 0

The timing cludge I suspect would be pretty nasty in terms of the efficiency of other kernels that run after. I’m targeting integrating with OpenGL calls (and ideally implementable with OpenGL compute shaders if possible later as well as CUDA) so I think that would not be ideal really as well :-(

It’s probably better then tracking a timer per worker though - so there is that!