I have a kernel with 81 grids and 256 threads per threadblock.
The first 201 threads in each block reads the same parts of the global memory.
Example: thread 0 in threadblock 0 reads global memory array with index 0
thread 1 in threadblock 0 reads global memory array with index 1
…
thread 0 in threadblock 1 reads global memory array with index 0
thread 1 in threadblock 1 reads global memory array with index 1
…and so on
So i have a lot of reads on the same global memory addresses.
Is there a better possibility for that issue because shared memory is not a solution i think??
Texture memory? etc?