I just started learning CUDA for my calculations. So this might be a very trivial question. But please help me solving my problem.

I’m trying to copy a packed 3D array A with dimensions (xdim1, ydim1, zdim1) to a zero-padded 3D array B with dimensions (xdim2, ydim2, zdim2) and do it in the reverse.

A cpu code will look like the following.

```
// Forward
int index1, index2;
int stridey1 = zdim1;
int stridex1 = stridey1 * ydim1;
int stridey2 = zdim2;
int stridex1 = stridey2 * ydim2;
// Forward
for (int i = 0; i < xdim2 * ydim2 * zdim2; i++) B[i] = 0.0f;
for (int i = 0; i < xdim1; i++) {
for (int j = 0; j < ydim1; j++) {
for (int k = 0; k < zdim1; k++) {
index1 = stridex1 * i + stridey1 * j + k;
index2 = stridex2 * i + stridey2 * j + k;
B[index2] = A[index1];
}
}
}
// Backward
for (int i = 0; i < xdim1; i++) {
for (int j = 0; j < ydim1; j++) {
for (int k = 0; k < zdim1; k++) {
index1 = stridex1 * i + stridey1 * j + k;
index2 = stridex2 * i + stridey2 * j + k;
A[index1] = B[index2];
}
}
}
```

How could I implement this in a GPU code? Seems straightforward but having difficulties…