in my computations I use a linearized 3D array and while all the number crunching is done entirely on the GPU I need to copy subdomains e.g. 2d slices or 3d volume subdomains of the field vectors to the host for intermediate plots and analysis.
I suppose that “acc update host()” only allows to specify a continuous sub-range of the array present on the device for copying to the host, e.g. a[4:10] but not [4:10,101:233, …]. Furthermore, the slices or subdomains are typically not contiguous due to the linearization of the 3D array.
What’s the most efficient way of copying over such selected data to the host. Do I create some sort of buffer array in which I copy the needed field values from the device array before copying the buffer array to the host where I then need to copy it into the host version of the 3D array? This is probably more efficient then doing “acc update device()” for every single array entry of interest. Or is there any other way of handling such a scenario?
If the subdomains and slices cover sufficiently many array entries compared with the total size of the array it is probably most efficient to copy over the whole array, but in my typical use case I am looking at just a number of 2d slices of a very large 3D grid.