Atomic operations and concurrent kernel launch

Hello all,

Currently I develop a CUA-based program using multiple kernels that are launched concurrently with multiple streams.

In my application, multiple kernels need to access a queue/stack and I have a plan to use atomic operations.

But I do not know whether atomic operations work between multiple kernels concurrently launched.
Please help me anyone who know the exact mechanism of the atomic operations on GPU or who has experience with this issue.