I have a array, d_zero, in the global memory, it has element only 0 and 1, with length n.
I also have another two array, A and B, the same length with d_zero, in global memory.
I want to do this thing:
If d_zero[i]==0
A[i]=A[i]+B[i]
should I use shared memory? I do not have much to share between different thread, I think. So if not, how can I make it fast? or it is fast using global memory only?
-
This math does not require neighboring data. So you do not need shared memory.
-
This operation is very simple and you are likely to be memory-bound.
-
You can get rid of the “if” statement by rearranging your math to use multiplications and additions. How about something like this:
A[i] += B[i] * (1 - d_zero[i]);
If you make sure you coalesce reads and writes to device memory, this is very fast.