Updating values in a 2D array using CUDA

Hello!
Just getting started with CUDA in C++ and I’ve hit a snag.
One of the tasks I need to do is to loop through a 2D array and update or populate the elements inside.
Running on the CPU, my code would look something like this:

int myArr[20][20480] = {0};

	for(int row = 0; row < 20; row++)
	{
		for(int col = 0; col < 20480; col++)
		{
			myArr[row][col] = row * 2; // Just an example to populate array.
		}
	}

So, my questions.
What would be the best data type to pass to the kernel to replicate this functionality? Secondly, would I be able to set up the block/threads in a way that I can use blockIdx.x as the row and the threadId as the col element? Or am I totally misunderstanding how the block and thread Id’s work? Either way, I would appreciate any suggestions on the best way to approach the above task on the GPU.

Some assumptions you can make about the data, the “row” count will always be something low, well under 100, but the col count could be anywhere up to 100,000 or so. Both row and col will be divisible by 32.
Any code examples or links to examples would be appreciated.
If you need more information please let me know.

Thanks in advance.

The simple way would be as follows (need to use flat arrays)

#define THREADS 256

__global__ void update_GPU(int *MyArr, const int col_range){
	const int col=threadIdx.x+blockIdx.x*blockDim.x;
	const int row=blockIdx.y;
	if(col<col_range){
		MyArr[row*col_range+col]=row*2;

	}
}
//....
///launch from host

dim3 grid((num_cols+THREADS-1)/THREADS,num_rows,1);
update_GPU<<<grid,THREADS>>>(MyArr,num_cols);

disclaimer: Not the fastest method and I did not test, just wrote a general example. If number of columns will be very large then have each thread do more than one index.

Keep in mind that int the kernel row(blockIdx.y) will range from 0 to num_rows-1, so you do not need to check like you do for col.

Thank you! It’s a good place to start. I will give this a test.