How to access sections of an array?

I have to extract sections of a large array for transferring data to different GPUs, at present, I would probably use something like the kernel below, for getting a portion of data when offsets are passed. A similar one could also be used to setting chunks.

__global__ void get_chunk (double *data, double *sub, int xstart, int ystart, int rows, int cols, int subset)

{

        int i,j;

        i = blockIdx.x * blockDim.x + threadIdx.x;

for (j = 0; j < subset; j++)

                sub[i*subset+j] = data[i*cols + (xstart*cols + ystart)+j];

}

I think the same could be done using a variant of cudamemCpy* (perhaps cudamemCpyArray(…)), but I am not sure how to do it. I need some code samples, or some directions on how it could be done.

Ans at stackoverflow - memcpy - Extract and Set portions of an array in CUDA - Stack Overflow

Thanks.