Conditional operations -- when to noop and when not to

I have some code that must be executed by all threads except a few on a boundary. Will including a if statement for those have smaller or larger overhead than letting the threads perform their (harmless) operations on bogus data? I assume that the noops will never constitute an entire warp.

In particular, this is the operation:

for(int i=0; i<n; i++) {
  tmp[threadIdx.x] += shared_array[threadIdx.x*n+i]*reg_array[i];
output[threadIdx.x+blockDim.x*blockIdx.x] = tmp[threadIdx.x];

Here, both shared_array and tmp are in shared memory, and reg_array is private to each thread. I here have the option to either

a) put an if statement around the whole for loop, or
b) put an if statement around the loop body, or
c) put an if statement around only the last line, which writes the result out to global memory.

Is any of them to prefer, and why?

My intuition is that by including if statements I effectively add instructions, and if a noop takes the same time as an arithmetic operation, it will actually be slower. On the other hand, if the operation has a latency, such as a memory read, avoiding that latency could still be worth the extra instruction.