Copying array with paddings?

I just started learning CUDA for my calculations. So this might be a very trivial question. But please help me solving my problem.

I’m trying to copy a packed 3D array A with dimensions (xdim1, ydim1, zdim1) to a zero-padded 3D array B with dimensions (xdim2, ydim2, zdim2) and do it in the reverse.

A cpu code will look like the following.

int index1, index2;

int stridey1 = zdim1;

int stridex1 = stridey1 * ydim1;

int stridey2 = zdim2;

int stridex1 = stridey2 * ydim2;

// Forward

for (int i = 0; i < xdim2 * ydim2 * zdim2; i++) B[i] = 0.0f;

for (int i = 0; i < xdim1; i++) {

   for (int j = 0; j < ydim1; j++) {

	  for (int k = 0; k < zdim1; k++) {

		 index1 = stridex1 * i + stridey1 * j + k;

		 index2 = stridex2 * i + stridey2 * j + k;

		 B[index2] = A[index1];

	  }

   }

}

// Backward

for (int i = 0; i < xdim1; i++) {

   for (int j = 0; j < ydim1; j++) {

	  for (int k = 0; k < zdim1; k++) {

		 index1 = stridex1 * i + stridey1 * j + k;

		 index2 = stridex2 * i + stridey2 * j + k;

		 A[index1] = B[index2];

	  }

   }

}

How could I implement this in a GPU code? Seems straightforward but having difficulties…