cudaMemcpyArrayToArray question

I want to make a cudaArray that is a copy of an existing one, except that it has a 1-cell thick border of '0’s.

My attempt to achieve this:

...

	cudaMallocArray(&edge, &channelDesc, M, N);

	cudaMallocArray(&old, &channelDesc, (M+2), (N+2));

...

	cudaMallocPitch((void **) &zero, &zerop, sizeof(float)*(M+2), (N+2));

	cudaMemset2D(zero, zerop, 0.f, sizeof(float)*(M+2), (N+2));

	cudaMemcpy2DToArray(old, 0, 0, zero, zerop, sizeof(float)*(M+2), (N+2), cudaMemcpyDeviceToDevice);

	cudaFree(zero);

	cudaMemcpyArrayToArray(old, 1, 1, edge, 0, 0, sizeof(float)*N*M, cudaMemcpyDeviceToDevice);

...

(‘edge’ is an N x M cudaArray and ‘old’ is an (N+2) x (M+2) cudaArray)

What I hoped this would do is create a 2D array called ‘zero’, which I fill with zeros, and then copy to my cudaArray ‘old’, setting all elements in ‘old’ to zero.

I then copy my existing cudaArray (‘edge’) to ‘old’, starting at the [1][1] cell of ‘old’.

I have reason to believe, however, that this is not doing what I intended. Do you think this code snippet should do what I want?

Thanks

Well I have written a short program just to test whether this is really what is not working in my program, and it appears I was right, as this program doesn’t work as I hoped:

#include <stdio.h>

#define N 5

#define M 5

main(){

	float buf_h[N][M], buf2[N+2][M+2], *zero;

	cudaArray *old, *edge;

	cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float>();

	size_t zerop;

	int i, j;

	cudaMallocArray(&edge, &channelDesc, M, N);

	cudaMallocArray(&old, &channelDesc, (M+2), (N+2));

	for(i=0;i<N;i++){

  for(j=0;j<M;j++){

  	buf_h[i][j]=1.0;

  }

	}

	cudaMemcpyToArray(edge, 0, 0, buf_h, sizeof(float)*N*M, cudaMemcpyHostToDevice);

	cudaMallocPitch((void **) &zero, &zerop, sizeof(float)*(M+2), (N+2));

	cudaMemset2D(zero, zerop, 0, sizeof(float)*(M+2), (N+2));

	cudaMemcpy2DToArray(old, 0, 0, zero, zerop, sizeof(float)*(M+2), (N+2), cudaMemcpyDeviceToDevice);

	cudaFree(zero);

	cudaMemcpy2DArrayToArray(old, 1, 1, edge, 0, 0, sizeof(float)*N, M, cudaMemcpyDeviceToDevice);

	

	cudaMemcpyFromArray(buf2, old, 0, 0, sizeof(float)*(N+2)*(M+2), cudaMemcpyDeviceToHost);

	for(i=0;i<N+2;i++){

  for(j=0;j<M+2;j++){

  	printf("%f ", buf2[i][j]);

  }

  printf("\n");

	}

	cudaFreeArray(old);

	cudaFreeArray(edge);

}

(Note that I changed from cudaMemcpyArrayToArray to cudaMemcpy2DArrayToArray, which seems to give results closer to what I want, although I don’t know why)

I expected:

0.0 0.0 0.0 0.0 0.0

0.0 1.0 1.0 1.0 1.0 1.0 0.0

0.0 1.0 1.0 1.0 1.0 1.0 0.0

0.0 1.0 1.0 1.0 1.0 1.0 0.0

0.0 1.0 1.0 1.0 1.0 1.0 0.0

0.0 1.0 1.0 1.0 1.0 1.0 0.0

0.0 0.0 0.0 0.0 0.0

but I got:

0.0 0.0 0.0 0.0 0.0 0.0 0.0

-0.0 -0.0 -0.0 -0.0 -0.0 0.0 0.0

-0.0 -0.0 -0.0 -0.0 -0.0 0.0 0.0

-0.0 -0.0 -0.0 -0.0 -0.0 0.0 0.0

-0.0 -0.0 -0.0 -0.0 -0.0 0.0 0.0

-0.0 -0.0 -0.0 -0.0 -0.0 0.0 0.0

0.0 0.0 0.0 0.0 0.0 0.0 0.0

(excess '0’s after decimal removed for clarity)

There are two things wrong with this. The first is that the results appear to have been copied incorrectly - instead of 1.0, it seems to be copying -0.0. The second problem is that the starting point of the ‘old’ array to which it appears to be copying is not [1][1] as I requested, but [1][0].

Ideas would be appreciated.

It turns out that this works perfectly if instead of asking for the starting point of the ArrayToArray copy to be old[1][1], I say old[4][1], it puts it where I want, old[8][1] puts it where I would have thought old[2][1] should go, etc.

This either seems to be a bug, or it is just not explained very well in the Reference Manual.

Ah, actually not perfectly correctly, as the idea of Memsetting ‘zero’ to be all zeros and then copying this to ‘old’ (as a way of Memsetting ‘old’ as there doesn’t seem to be a Memset function for Arrays) does seem to set ‘old’ to be all zero, but then if I change the value I am filling ‘zero’ with (to another integer, as cudaMemset2D only seems to take integer values), it doesn’t change the result of the output (‘old’ still has a border of zeros). Perhaps the problem is that I am trying to Memset ‘zero’ (which is supposed to contain floats) with integers (for the reason mentioned above).

No, srcX/dstX parameters are in bytes. While the docs missed mentioning this it can’t be much else since cuda does not know the size of your type.

Ah, I see. These things definitely need to be made clearer in the Manual.

There isn’t any problem setting a 0 float with memset. All bits set to 0 does indeed represent the floating point number positive 0.

So it should work to Memset it to 0, but I wouldn’t be able to Memset to 5 instead, right?