I am using ldgsts with software pipeline to hide the memory-accessing latency, but I am wondering whether shared memory supports reading and writing (on different address) at the same time?
If using shared memory reading instructions like ldmatrix, will these instructions block loading data from global memory to shared memory?
most sass instructions can execute concurrently with other sass instructions. The usual point at which this may not be the case is where there are register dependencies between instructions.
I don’t know of any reason to conclude that ldgsts would somehow prevent other shared activity that is loading to/from non-overlapping spaces, to a non-overlapping register footprint. In fact this basically seems necessary/essential for the pipeline and producer-consumer models I am familiar with.