Hello All,
I was wondering if anyone had any ideas on a good way to communicate/transfer data between multiple blocks. I’ll describe a simple scenario of what I am trying to achieve:
The application I am creating is essentially a hierarchy, and the different blocks are different nodes. For a simple example, lets say I have 3 blocks (B0, B1, and B2). For my application, each block takes in 512Bytes of input data. B2 is the ‘parent’ node of blocks B1 and B0. So basically when I run my application, blocks B0 and B1 evaluate and concatenate their outputs (256B + 256B) to be the INPUT of block B2 (for a total of 512B input).
Now the problem is, the global memory I am writing for some blocks OUTPUT is the global memory for another blocks INPUT. Granted, in the 3-block case there is no problem (since I have more than three processors, all 3 blocks can run concurrently. The inputs of the parent are just always one iteration behind the children, but this is no problem). However since we can’t say anything about the block ORDERING (especially when I scale this up to larger numbers of blocks), this cannot be done. And in fact I do see the unexpected behavior, where the child-block has only written some of its outputs before a parent starts reading them, so the inputs I see are always different and unpredictable.
This brings me to the solutions I have thought about so far:
-
Launch a separate kernel for each layer. Basically complete each layer in block step. I have implemented this, but it definitely takes a performance hit (in my case, almost a 3x!).
-
I was thinking about using some atomic operations which basically the block would need to acquire a lock before reading or writing the ENTIRE 512Bytes at once, this keeping multiple blocks from reading or writing it at the same time. Any thoughts here on if this is ok or not?
-
Any other ideas? I have been looking through the forum posts for anyone who has done block communication/data transfer but haven’t found anything exactly like this yet…
THANKS!