“For each element in a create a thread, in each thread determine”
→It was expressed as follows.
int idx = blockIdx.x * blockDim.x + threadIdx.x;
int jdx = blockIdx.y * blockDim.y + threadIdx.y;
“which b to load and write it in.”
→I’m not getting a clear picture in my mind. I can’t imagine it.
How would you write source code specifically?
You only need one dimension for the elements of a. So just int idx = blockIdx.x * blockDim.x + threadIdx.x;
Find out how many elements a has (what is the maximum index).
Then find out, which intervals of indices run which code, some array elements are written to more than once, find out, which is written to last.