What's wrong with this code?

I apologise for asking, but I have really spent quite a long time looking at this code and I can’t see what is wrong with it:

#include <stdio.h>

#define M 5

#define N 5

texture<int, 2> oldtex;

texture<int, 2> edgetex;

__global__ void computeker(int *buf3){

	*(buf3 + blockIdx.y*(M+2) + blockIdx.x)=tex2D(oldtex, blockIdx.y, blockIdx.x);

}

main(){

	int i, j;

	int buf_h[N][M], buf2[N+2][M+2], *zero, *buf3;

	cudaArray *old, *edge;

	dim3 block(1, 1);

	dim3 grid(M+2, N+2);

	cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<int>();

	size_t zerop;

	cudaMallocArray(&edge, &channelDesc, M, N);

	cudaMallocArray(&old, &channelDesc, (M+2), (N+2));

	for(i=0;i<N;i++){

 Â for(j=0;j<M;j++){

 �  buf_h[i][j]=123;

 Â }

	}

	cudaMemcpyToArray(edge, 0, 0, buf_h, sizeof(int)*N*M, cudaMemcpyHostToDevice);

	cudaMallocPitch((void **) &zero, &zerop, sizeof(int)*(M+2), (N+2));

	cudaMemset2D(zero, zerop, 5, sizeof(int)*(M+2), (N+2));

	cudaMemcpy2DToArray(old, 0, 0, zero, zerop, sizeof(int)*(M+2), (N+2), cudaMemcpyDeviceToDevice);

	cudaFree(zero);

	cudaMemcpyArrayToArray(old, sizeof(int), 1, edge, 0, 0, sizeof(int)*N*M, cudaMemcpyDeviceToDevice);

	

	cudaMalloc((void **) &buf3, sizeof(int)*(M+2)*(N+2));

	cudaBindTextureToArray(oldtex, old);

	cudaBindTextureToArray(edgetex, edge);

	computeker<<<grid, block>>>(buf3);

	cudaMemcpy(buf2, buf3, sizeof(int)*(M+2)*(N+2), cudaMemcpyDeviceToHost);

	for(i=0;i<N+2;i++){

 Â for(j=0;j<M+2;j++){

 �  printf("%d ", buf2[i][j]);

 Â }

 Â printf("\n");

	}

	

	cudaFreeArray(old);

	cudaFreeArray(edge);

	cudaFree(buf3);

}

It should output:

5 5 5 5 5 5 5

5 123 123 123 123 123 5

5 123 123 123 123 123 5

5 123 123 123 123 123 5

5 123 123 123 123 123 5

5 123 123 123 123 123 5

5 5 5 5 5 5 5

But instead I get:

84215045 84215045 123 123 123 84215045 84215045

84215045 123 123 123 123 84215045 84215045

84215045 123 123 123 123 84215045 84215045

84215045 123 123 123 123 84215045 84215045

84215045 123 123 123 123 84215045 84215045

84215045 123 123 123 84215045 84215045 84215045

84215045 123 123 123 84215045 84215045 84215045

Once again I apologise.

Thank you very much for any ideas

hi,

no need to apologi"z"e:) i’m not familiar with cudaArray, but i suggest your code methodology could

1, add CUT_CHECK_ERROR() after each kernel call to capture potential cudaError info;

2, try small, aligned data (say, 2x2 texture rather than 5x5) first;

3, use printf as much as possible. i’d like to share my printf macro:

template<class T>

inline void Watch(T* d_A, int nA)

{

#if _DEBUG

	if(d_A)

	{

 Â T* h_A = (T*)malloc( sizeof(T) * nA);

 Â CUDA_SAFE_CALL(cudaMemcpy(h_A, d_A, sizeof(T) * nA, cudaMemcpyDeviceToHost));

 Â SAFE_FREE(h_A);

	}

#endif	

}

set a break point before SAFE_FREE, then you can watch every element in your device var. btw,

#define SAFE_FREE§ {if§ {free§; §=NULL;} };

#define SAFE_CUDA_FREE§ {if§ {CUDA_SAFE_CALL(cudaFree§); §=NULL;} };

often mentioned: cudaMemsetXX works bytewise so you can’t set anything else than zero to non-8-bit values.

After the memset you would expect:
0x00000005 0x00000005 0x00000005 0x00000005

what you get is:
0x05050505 0x05050505 0x05050505 0x05050505

which is in decimal
84215045 84215045 84215045 84215045

So you have to init your array manually, which for large arrays can be done faster than memset which shows not quite the best performance.

Vrah

Thank you yk_cadcg and VrahoK.

VrahoK Ah yes, I see you are right. I actually wanted to Memset to 0 anyway, but I had just changed it to 5 to make sure it was actually doing something.

So that’s one problem solved, the other still remains… why it isn’t putting the '123’s in the right places.

Hm, the source and destination array widths in cudaMemcpyArrayToArray do not match, I doubt that function works for that case. Try e.g. cudaMemcpy2DArrayToArray instead. Unfortunately the documentation leaves most of these functions a complete mystery…

Haha, you beat me to it… I had just remembered that in a previous post about doing something similar I had worked out that 2DArrayToArray does what I want.

Yeay! :-)

Thanks again to all