Problems transferring memory Sizes not consistent?

I’m just starting to play with CUDA. I wrote a semi-complicated program, but it wasn’t working, so I decided to jump way back to baby steps. I’m just testing copying memory to and from the device, editing some values, etc.

Here’s my code (the relevant parts, anyway):

#define ROWS 16

#define COLS 16

#define At(a, r, c) *((a) + ((r) * ROWS + (c)) * sizeof(int))

int main(){

	size_t size = ROWS * COLS * sizeof(int);

	int* a = (int *)malloc(size);

        //int* b = (int *)malloc(size);

	int i,j;

	for(i = 0 ; i < ROWS; i++){

		for(j = 0; j < COLS; j++){

			At(a, i,j) = 1;

		}

	}

	

	print2D(a, ROWS, COLS);

	printf("\n");

	int *d_a;

	cudaMalloc(&d_a, size);

	cudaMemcpy(d_a, a, size, cudaMemcpyHostToDevice);

	for(i = 0 ; i < ROWS; i++){

		for(j = 0; j < COLS; j++){

			At(a, i,j) = 2;

		}

	}

	

	dim3 dimBlock(1,1);

	dim3 dimGrid(1,1);

	t<<<dimGrid, dimBlock>>>(d_a, ROWS, COLS);

	cudaMemcpy(a, d_a, size, cudaMemcpyDeviceToHost);

	print2D(a, ROWS, COLS);

//free(a);

        //free(b);

        //cudaFree(d_a);

}

__global__ void t(int* m, int rows, int cols){

	int i, j;

	for(i = 0; i < rows; i++){

		for(j = 0; j < rows; j++){

			At(m,i,j) = 4;

		}

	}

}

__host__ void print2D(int* mat, int rows, int cols){

	int i;

	int j;

	for(i = 0; i < rows; i++){

		for(j = 0; j < cols; j++){

			printf("%5d", At(mat,i,j));

		}

		printf("\n");

	}

}

What I intended to happen-

  1. Allocate the array, initialize every entry to 1

  2. Move the array to the device

  3. Change every entry in the array on the host to 2

  4. Change every entry in the array on the device to 4

  5. copy from device -> host

  6. The second print statement should be all 4s

But what actually happens is when I print a the second time, I get the first 3 rows to be 4s and the rest oft he entries are 2s. If I change size to a bigger number, all of the entries are 2.

Also, I noticed that after my malloc of a, I can’t do anything with regards to memory management. Allocating a new array, freeing a or cudaFreeing d_a all cause a segfault. This might be a bit of C that has fallen out of my brain, but any ideas why that’s happening?

I’ve looked at sample code and I don’t see what I’m doing differently. I know it has to be something simple, so what am I missing?

Thanks.

Seems this is wrong
#define At(a, r, c) *((a) + (® * ROWS + ©) * sizeof(int))

cause all pointers are not char or byte but int already

I’m not sure if I understand what you’re saying/why this wouldb e a problem. I added the sizeof(int) in to it as well because the macro is meant to dereference the pointer in the array automatically. I.e. I’m using an array of ints, so ints is in the macro (not the most extensible code, but whatever).

int* a;

how do you access element number 1?
*(a+sizeof(int)) or *(a+1)?