need consider synchronization problem?

in such function:

global func(int *count)

need consider synchronization problem of *count? how to implement?

  1. 8600 supports atomic operation
  2. I implemented a lock on 8800 in
    using customized compiler.
    It’s slow, and only works per block. You’d have to do a per-block reduction first.
  3. Convert to reduction, likely to be most efficient