OpenCL and global synchronization

How to do global synchronization on NVIDIA cards using atomics? Is it doable only on Fermi? Is it limited (the global work size) to the number of SIMD modules (for example 16 for GTX 580)?

How to do global synchronization on NVIDIA cards using atomics? Is it doable only on Fermi? Is it limited (the global work size) to the number of SIMD modules (for example 16 for GTX 580)?

Global synchronization should be done on kernel level, i.e. by launching multiple kernels.

Global synchronization should be done on kernel level, i.e. by launching multiple kernels.