If I have one SM, and two blocks can be run on it. The intermediate result of one block will be written into L1 cache (not shared memory) (will it? or directly write back to L2?) And, can another block read this intermediate result from L1??
Because I read from somewhere that, different blocks can only communicate using global memory…
Like, if block 1 writes a value back to global, but this value will also be write into L1, and another block 2, which accidentally also in the same SM, can hit this L1 intermediate result, right?
Yes, blocks residing on the same SM share the same L1. If block A, on SM X, writes to global memory, and block B, on SM X, later reads from that same location in global memory, my expectation is that block B will hit in the L1, on the value that was written by block A.
Yes, two blocks from the same host process can be coresident on the same SM. They will share the L1 as already indicated.
BTW for better sync options between two blocks of the same kernel or different kernels, you could put the code into one block of the same kernel and test for thread number, e.g.
// run for example with 1024 threads per block
if (threadIdx.x <= 512) {
// code 1
int myidx = threadIdx.x; // 0..511
} else {
// code 2
int myidx = threadIdx.x - 512; // 0..511
}