Data Transfer between Accelerator and Host

In the article:

The PGI Accelerator Programming Model on NVIDA GPUs Part 2 Performance Tuning

The data section of data communications between host and device or accelerator the -Minfo messages such as

Generating copyout ([b[1:n-2][1:m-2]) are generated when using the -Minfo compiler command.

However, the programmer can be smarter that the compiler and write his own commands here.

This topic is in the section Host /Accelerator Data Movement

The commands are different depending on the case, but the partial matrices ranges are the same.

For instance matrix b from above is now

local ([b[1:n-2][1:m-2])

Since matrix b is only needed on the accelerator not the host. However, the range of elements in matrix b is exactly the same in both cases. This is a very convenient since the partial matrix is already defined for you. Is this by accident in this special case or is this the way it usually occurs?

This may be naive question, but it is not addressed in any of the literature that I have read, but it seems very clear in the examples in the PGroup literatire.

THS 1138

This is the explanation in the article.

The second thing we notice is the data traffic for the a array includes a noncontiguous region. The copyin generated is for the whole matrix, but the copyout, from the GPU back to the host, only moves the modified elements, which are the interior of the array. This minimizes the data traffic, but moving noncontiguous regions is more costly than moving one large contiguous section. We can tune this by adding another clause to the region directive:

#pragma acc region local(b[1:n-2][1:m-2]) copy(a[0:n-1][0:m-1])

This tells the compiler to move the whole a array both over to the GPU and back again; it moves more data, but the moves are more efficient. The messages from the compiler are now:

1:n-2, 1:m-2
form the indices of the matrix minus the first and last rows and columns.

Since it would be faster to move the whole array rather than the non-contiguous
parts in the interior.