Force divergence

I have some nice code, namely an implementation at Euclid’s Algorithm that runs lots of operations at once, which fakes recursion, so naturally it needs to be careful with its memory. I’m using an array of dynamically allocated shared memory, which I’m passing out to threads as necessary. The only problem is that i need to allocate memory to each iteration manually, and that requires the use of an int next_element_to_allocate to keep track of which “stack” element will be allocated next.

This brings me to my question. I need the code that does the allocation to be executed serially, to avoid the case where all the threads use the same location in memory as their next stack element. At the moment, I have an if statement like so:


if (threadIdx.x == 0) {

// Allocate

} else {

// The same thing as above



This, I believe, means that all the threads cannot possibly evaluate the statement the same way, forcing them to diverge.

Is there a more elegant way of doing this?