Static 2D array problem

Hi all,

So simply, i am trying to do this.

  • declare a static 2D float array on host.

  • transfer it to the device

  • access the array on device just like i wud do at the host, (using )

  • process the array and return the array to host and print.

I obviously havent been able to do it, so am here. My question is , if it is at all possible to do with CUDA?.

If so, what is the problem with my code below:

float st_temp[4][4];

	float **dev_temp;

	for(int i=0;i<4;i++)

		for( int j=0;j<4;j++)

			 st_temp[i][j] = (float)rand()/RAND_MAX;

	for(int i=0;i<4;i++)

		for(int j=0;j<4;j++)

			printf("\n THE STATIC ARRAY ELEMENT : %.1f ",st_temp[i][j]);

	

	size_t dev_pitch,st_pitch;

	cudaMallocPitch((void**) &dev_temp, &dev_pitch, 4*sizeof(float), 4);

 	

	cudaMemcpy2D((void *)dev_temp,dev_pitch,(const void *)st_temp,dev_pitch, 4*sizeof(float *),4, cudaMemcpyHostToDevice);

...........

// gpu kernel

__global__ void kernel(float** a, float** b, float **c)

{ 

	

	

	for(int i=0;i<4;i++)

		for(int j=0;j<4;j++)

			printf("\n THE STATIC ARRAY ELEMENT : %.1f ",c[i][j]);

}

Thanks in advance!!

compiler does row-major map such that st_temp[i][j] = st_temp( i*4 + j ), but compiler must

know dimension of st_temp, that is why you declare

float st_temp[4][4];

However in your kernel,

[codebox]global void kernel(float** a, float** b, float **c)

{

for(int i=0;i<4;i++)

    for(int j=0;j<4;j++)

        printf("\n THE STATIC ARRAY ELEMENT : %.1f ",c[i][j]);

}[/codebox]

Compiler cannot map c[i][j] since no dimension of c is known at compiler time.

Question: why not using 1-D array and do index transformation manually?

Here is what i did to solve it in 3 simple steps

Declare 2D array in main :

float a[N][N],c[N-2][N-2];

Pass array to kernel:

kernel<<<dimGrid,dimBlock>>>((float() [N])a,(float() [N-2])c);

Kernel Signature:

global void kernel(float a[N][N], float b[N-2][N-2])

Access array in kernel as a regular 2D array syntax.

Note that i was more concerned about the syntax than the internal implementation and its meanings.

hope it helps to those who are concerned abt syntax like me. :)

thanx for ur interest.

-peace

I try simple case, data copy, it works

static __global__ void test_copy2D( doublereal odata[N][N], doublereal idata[N][N])

{	

	// read the matrix tile into shared memory

	unsigned int xIndex = blockIdx.x * BLOCK_DIM + threadIdx.x;

	unsigned int yIndex = blockIdx.y * BLOCK_DIM + threadIdx.y;

	if((xIndex < N) && (yIndex < N))

	{

		odata[yIndex][xIndex] = idata[yIndex][xIndex];

	}

}

Could you post whole your code ?

besides, fix dimension in kernel function is not good.

Hi,

2D arrays are not double pointers (pointers to pointers, arrays of arrays): they are used in the same way, with , but they are different things. Since you are speaking of static arrays, it is enough you replace their declaration:

__global__ void kernel(float a[N][], float b[N][], float c[N][])

Now the compiler will know how to access the elements, but notice that you will not be able to work with “pitched” arrays: if you have N oddly dimensioned, you are going to encounter performance troubles. View the CUDA programming guide about cudaMallocPitch() and the related example for reference.

I am glad I found one more monkey admirer in this forum!