Critical section and ballot

This SO answer implements the critical section with using the ballot function. However, this code is deprecated in the current CUDA. How would this code look nowadays?

__device__ bool warp_lock(int req){
  return ((__ffs(__ballot(req))) == ((threadIdx.x & 31)+1));
}

__ballot_sync() is the replacement, I believe:

Thanks for the quick link. The new version uses another set of parameters. What those parameters should be? This link seems more informative.

I’m not sure my understanding that __ballot(req) can be simply changed to __ballot_sync(FULL_MASK, result); is correct.

If you meant, “__ballot_sync(FULL_MASK, req);” above, then I believe this will do the job, taking into consideration both the SO code and the “Warp Level Primatives” post.

The difference between the two is that there is no synchronisation guarantee between all active threads at this point, in the __ballot(req) case and FULLMASK to capture all threads is a valid choice.

I’m assuming “FULLMASK” == 0xFFFFFFFF