CUDA Kernel seems not to be excecuted

Hello,

I’ve copied the first source code from the CUDA Programming Guide which adds two matrixes, and I’ve completed it to make it a working example; I’ve included it some headers and initializated the matrix with some random entries. But when I excecute the program it does not as expected since the resulting matrix is zero. Could you please point me out why this is happening?

[codebox]#include <stdio.h>

#define N 256

float A[N][N], B[N][N], C[N][N];

int i,j;

// Kernel definition

global void MatAdd(float A[N][N], float B[N][N],

                    float C[N][N])

{

int i = blockIdx.x * blockDim.x + threadIdx.x;

int j = blockIdx.y * blockDim.y + threadIdx.y;

if (i < N && j < N)

C[i][j] = A[i][j] + B[i][j];

}

int main()

{

// Kernel invocation

//initialization

srand(20);

for(i=0 ; i < N ; i++)

for(j=0 ; j < N ; j++)

	{

	A[i][j] = (int)(10*(rand()/((double)RAND_MAX+1)));

	B[i][j] = (int)(10*(rand()/((double)RAND_MAX+1)));

	}

dim3 dimBlock(16, 16);

dim3 dimGrid((N + dimBlock.x - 1) / dimBlock.x,

             (N + dimBlock.y - 1) / dimBlock.y);

MatAdd<<<dimGrid, dimBlock>>>(A, B, C);

for(i=0 ; i < 10 ; i++)

{

printf("\n");

	for(j=0 ; j < 10 ; j++)

	printf("%d ", (int)A[i][j]);

}

printf("\n");

for(i=0 ; i < 10 ; i++)

{

printf("\n");

	for(j=0 ; j < 10 ; j++)

	printf("%d ", (int)B[i][j]);

}

printf("\nResult:");

for(i=0 ; i < 10 ; i++)

{

printf("\n");

	for(j=0 ; j < 10 ; j++)

	printf("%d ", (int)C[i][j]); 

}

printf("\n");

}[/codebox]

I expect that it is printed the result from the sum of the two matrix, but I get that the C matrix has all zero values.

Thank you.

You cannot pass host pointers to CUDA kernels. To make your kernel work you will have to use cudaMalloc() to allocate storage on the GPU, then cudaMemcpy() to copy the initialized array from host memory into GPU memory, then execute the kernel. An additional cudaMemcpy() is then required to copy the results back from the GPU to host memory.

You might also want to reconsider your choice of using C two-dimensional arrays to store your data, because that will add several additional layers of complexity that you probably don’t want.