Critical Section

Hey Guys , is there any way to create critical sections with cuda ?
i know that the common way is to use ATOMIC operations , but i found it difficult to impelement.
my goal is to use vector like type in my kernel function to store specific values .
i use 512 * 512 matrix , and no more than 10 results will be picked out that matrix.

is there any way to do it?