What is the detail of memory operations?

“Global memory” can communicate with “host memory”, “shared memory”, and “register file”. Also, the memory operation of “global memory” could be either “write to” or “read from”. In total there are 6 combinations, e.g. “read from shared memory write to global memory”. My question is:

Do these operations have separate bandwidth? For instance, could “read from host memory write to global memory” run concurrently with “read from registers write to global memory”, or they just shared the same port and queue to run sequentially?

Any answers or related resources are helpful, thanks!

For the most part, they don’t have separate bandwidth. You may be interested in this recent similar question, which compares two paths.

That helps, thank you.