please Tell me which one is faster

Consider an example i want to add 2 arrays say A and B and store it in a separate array C. which approach will be faster. first or second?



  1. i = threadIdx.x

It’s a completely different operation. threadIdx.x/y/z are independent indexes so I’m not really sure what you will achieve with the first example, I believe that a some of some random element in a with some other random element in b (well not completely random, as threads are initially scheduled in order, but not necessarily deterministic either).

The second example would be coalesced on both read and right, assuming that the read/write size is 4 bytes (int/float) so it would be faster

Thank you :)