I have to extract sections of a large array for transferring data to different GPUs, at present, I would probably use something like the kernel below, for getting a portion of data when offsets are passed. A similar one could also be used to setting chunks.
__global__ void get_chunk (double *data, double *sub, int xstart, int ystart, int rows, int cols, int subset)
{
int i,j;
i = blockIdx.x * blockDim.x + threadIdx.x;
for (j = 0; j < subset; j++)
sub[i*subset+j] = data[i*cols + (xstart*cols + ystart)+j];
}
I think the same could be done using a variant of cudamemCpy* (perhaps cudamemCpyArray(…)), but I am not sure how to do it. I need some code samples, or some directions on how it could be done.
Ans at stackoverflow - memcpy - Extract and Set portions of an array in CUDA - Stack Overflow
Thanks.