Problem with 2D block

Hello,

I’m trying to store each bock’s x and y in a 2D array, and turn them back to the host, my code is:

#include <iostream>

#define N 95

__global__ void add(int *ch) {
	*(ch + blockIdx.x * 2) = blockIdx.x;
	*(ch + (blockIdx.x * 2) + 1) = blockIdx.y;
}
int main(void) {

	int h_ch[N][2];
	int *dev_ch;

	cudaMalloc((void**)&dev_ch, sizeof(int[N][2]));

	dim3 numBlocks(N,N);

	add << <numBlocks, 1 >> > (dev_ch);

	cudaMemcpy(h_ch, dev_ch, sizeof(int[N][2]), cudaMemcpyDeviceToHost);

	for (int i = 0; i < N; i++) {
		printf("%d-%d\n", h_ch[i][0], h_ch[i][1]);
	}

	cudaFree(dev_ch);

	return 0;
}

The result of this code is:

cuda_test.exe
0-94
1-94
2-94
3-94
4-94
5-94
6-94
7-94
8-94
9-94
10-94
11-94
.
.
.
92-94
93-94
94-94

As can be seen, the value of x is changed (as expected), but the y is 94 for all rows. I was wondering what is wrong? Is something wrong with the code, or I understood the concept of grids/blocks/threads wrong?

Thanks in advance.

So you have an NxN grid of thread blocks that try to store into an Nx2 array of results? Something tells me that’s not going to work well …

Yes, exactly. what is the problem ?

There are NxN = 95*95 = 9025 thread blocks. You want to store each block’s [x,y] coordinates in an array. How many elements does that array need? I would claim 9025 elements, because that’s how many blocks there are. How large an array does the code provide?

So, I should allocate memory for sizeof(int[N*N][2]) ? And how should I calculate index for the storing?

Hint: Much can be learned by (1) thinking and (2) experimenting. I am absolutely certain you can figure this out.

https://en.wikipedia.org/wiki/Think_(IBM)

I think the correct calculation for indexes is:

int idx = (blockIdx.x * N) + (blockIdx.y * 2);

*(ch + idx) = blockIdx.x;
*(ch + ++idx) = blockIdx.y;

And also, allocated sizeof(int[N*N][2]) in memory. However, it didn’t work fine so I changed my code to see how x and y vary, the x changes from 0 to 94 and is repeated 94 times for each number, but, the y varies from 51 to 94 and gets repeated 94 times for each number.

What is wrong with y? shouldn’t it begin from 0 and end up to 94?