 I just started learning CUDA for my calculations. So this might be a very trivial question. But please help me solving my problem.

I’m trying to copy a packed 3D array A with dimensions (xdim1, ydim1, zdim1) to a zero-padded 3D array B with dimensions (xdim2, ydim2, zdim2) and do it in the reverse.

A cpu code will look like the following.

``````// Forward

int index1, index2;

int stridey1 = zdim1;

int stridex1 = stridey1 * ydim1;

int stridey2 = zdim2;

int stridex1 = stridey2 * ydim2;

// Forward

for (int i = 0; i < xdim2 * ydim2 * zdim2; i++) B[i] = 0.0f;

for (int i = 0; i < xdim1; i++) {

for (int j = 0; j < ydim1; j++) {

for (int k = 0; k < zdim1; k++) {

index1 = stridex1 * i + stridey1 * j + k;

index2 = stridex2 * i + stridey2 * j + k;

B[index2] = A[index1];

}

}

}

// Backward

for (int i = 0; i < xdim1; i++) {

for (int j = 0; j < ydim1; j++) {

for (int k = 0; k < zdim1; k++) {

index1 = stridex1 * i + stridey1 * j + k;

index2 = stridex2 * i + stridey2 * j + k;

A[index1] = B[index2];

}

}

}
``````

How could I implement this in a GPU code? Seems straightforward but having difficulties…