Hello!
Just getting started with CUDA in C++ and I’ve hit a snag.
One of the tasks I need to do is to loop through a 2D array and update or populate the elements inside.
Running on the CPU, my code would look something like this:
int myArr[20][20480] = {0};
for(int row = 0; row < 20; row++)
{
for(int col = 0; col < 20480; col++)
{
myArr[row][col] = row * 2; // Just an example to populate array.
}
}
So, my questions.
What would be the best data type to pass to the kernel to replicate this functionality? Secondly, would I be able to set up the block/threads in a way that I can use blockIdx.x as the row and the threadId as the col element? Or am I totally misunderstanding how the block and thread Id’s work? Either way, I would appreciate any suggestions on the best way to approach the above task on the GPU.
Some assumptions you can make about the data, the “row” count will always be something low, well under 100, but the col count could be anywhere up to 100,000 or so. Both row and col will be divisible by 32.
Any code examples or links to examples would be appreciated.
If you need more information please let me know.
Thanks in advance.