How to communicate between blocks?

If I have one SM, and two blocks can be run on it. The intermediate result of one block will be written into L1 cache (not shared memory) (will it? or directly write back to L2?) And, can another block read this intermediate result from L1??

Because I read from somewhere that, different blocks can only communicate using global memory…

Different blocks can communicate via global memory. On Hopper blocks in the same Cluster can additionally use distributed shared memory

I think the answer is the write goes to both L1 and L2.

See Robert’s answer:

1 Like

Like, if block 1 writes a value back to global, but this value will also be write into L1, and another block 2, which accidentally also in the same SM, can hit this L1 intermediate result, right?

Also I am wondering, can two blocks from different streams works on a same SM? They can reuse data within one L1?

I wonder the same things, but I do not know the answer to these questions.

This is about all I rely on:

1 Like

Yes, blocks residing on the same SM share the same L1. If block A, on SM X, writes to global memory, and block B, on SM X, later reads from that same location in global memory, my expectation is that block B will hit in the L1, on the value that was written by block A.

Yes, two blocks from the same host process can be coresident on the same SM. They will share the L1 as already indicated.

BTW for better sync options between two blocks of the same kernel or different kernels, you could put the code into one block of the same kernel and test for thread number, e.g.

// run for example with 1024 threads per block

if (threadIdx.x <= 512) {
    // code 1
    int myidx = threadIdx.x; // 0..511
} else {
    // code 2
    int myidx = threadIdx.x - 512; // 0..511
}
1 Like

excatly! I am thinking about matmul. If we have enough work, maybe it is better to have one huggggge block to fill a whole SM!