Atomic operations and Block communication

I need a way to communicate data between blocks. The kernel has many blocks each of which operate on some data and arrive at some results. And, Each block has to communicate some results (say 5 integers) to the previous block.

Like this: Block n needs to communicate data to Bloc (n-1)
Block (n-1) needs to accept data from Block N and communicate some
data to Block (n-2)
And so on.

Will this be possible by making use of “Atomic” operations?

I am interested in the case where the number of blocks is more i.e. more than what it takes to keep all the 16 multiprocessors busy. Thus, you may have a block that would be ready to accept data BUT the “producer” block would NOT have been scheduled yet and so on.

Any thoughts?

This has been discussed here many times. The search function of this forum will help you find thoughts and ideas.

In general I think it is save to say that it is very cumbersome because the architecture is not intended to do something like this.
I think it is possible for very specific problems but I don’t think there’s a solution to the problem you mention (block not scheduled but data from block required) because we have no influence on how the GPU schedules blocks. You could order the blocks so that this won’t happen. You could also try something like this: http://forums.nvidia.com/index.php?showtopic=53009

Depending on your algorithm it might also be possible to redesign it to circumvent the need for global synchronization.

ok. I see compute capable 1.1 devices have these atomic operations.

Were they introduced to synchronize between blocks? I know it syncs between threads. I would assume that it applies to threads from different blocks too. No???

What do you mean by “Order the blocks” ??? Is it possible to specify an order of execution among blocks???

The atomic operations are what they are. Atomic operations. I don’t see how and why atomic operations would synchronize between blocks. Documentation states “it [the atomic operation] is guaranteed to be performed without interference from other threads”. I don’t know if it synchronizes all running threads (I have no 1.1 capable device to play with) but if it does I would guess it synchronizes only running threads.

I’m sorry I guess my statement about block ordering was misleading. No it is not possible. But there is a method described in the thread I mentioned how something similar to block ordering can be achieve. You basically don’t order the blocks but assign the work in the order you want it executed.