Newbie: Eliminating Race Condition

Hi there,

I’m doing something like this in a kernel and I think it creates a race condition.

__device__ void kernel(int *best)


  int tid = threadIdx.x;

  int value = someFunction(tid);

  if(value > *best) *best = value;


What is the best way to get around this, i.e. computing the maximum of a number of concurrently computed values.



If all or many treads write to *best at the same time, then yes, you will have a problem,
In short, your code doesn’t appear very suitable for parallelism. (But someone more skilled than me might have a better answer for you)
You could use atomic operations.
Or maybe you could build a binary heap structure ( that allows for partially resolving of “largest value” in a fashion that allows for parallelism.
Doesn’t sound very effective though.

In conclusion, I don’t think that simple program should be done in cuda, but probably you’re doing something more complex, so you might wanna elaborate.

What you are trying to do can be done efficiently via a parallel reduction algorithm. There is a very optimal example in the SDK, and a whitepaper which discusses how it works.