Can one query the block thread count without synchronization (__syncthreads_count)?


Background: I’m trying to use the GPU to scan CRC polynomials for certain properties and I’m trying to make the code faster. I don’t need any strict synchronization between threads/warps until the end of the kernel. That being said, blocks can execute for a long time and the odds of any given block making it the end is very rare; so the expectation is that one thread will eventually find a reason to exit the whole block. So given this scenario:

  • Is it possible to query whether any threads have exited without forcing a synchronization stall? Basically: “if any thread has exited, then exit this thread”
  • Alternatively, is there any way for one thread to force all of the other threads to exit?

I’ve tried simply putting a volatile boolean in shared memory and having each thread poll that during loops; and that seems to work fine, but I’m wondering if there is something faster.


It’s not possible to asynchronously force threads to exit without corrupting the CUDA context; most folks are not interested in that sort of behavior. Your volatile shared boolean is the approach that occurred to me as I read your description.