In the article:
The PGI Accelerator Programming Model on NVIDA GPUs Part 2 Performance Tuning
The data section of data communications between host and device or accelerator the -Minfo messages such as
Generating copyout ([b[1:n-2][1:m-2]) are generated when using the -Minfo compiler command.
However, the programmer can be smarter that the compiler and write his own commands here.
This topic is in the section Host /Accelerator Data Movement
The commands are different depending on the case, but the partial matrices ranges are the same.
For instance matrix b from above is now
Since matrix b is only needed on the accelerator not the host. However, the range of elements in matrix b is exactly the same in both cases. This is a very convenient since the partial matrix is already defined for you. Is this by accident in this special case or is this the way it usually occurs?
This may be naive question, but it is not addressed in any of the literature that I have read, but it seems very clear in the examples in the PGroup literatire.