global memory write/read

Hello All. Can you help me please with next stuff:
for example i have kernel that makes scalar product of vectors

output[index] = input1[index] * input2[index]
where is identical to each thread. The question is: can i use such kernel when input1 equals output (input1 == output). If there is no cache it is obvious that i can. But with cache?

Hello All. Can you help me please with next stuff:
for example i have kernel that makes scalar product of vectors

output[index] = input1[index] * input2[index]
where is identical to each thread. The question is: can i use such kernel when input1 equals output (input1 == output). If there is no cache it is obvious that i can. But with cache?