Hi, I have a kernel that has an array of some struct Foo in shared memory, like so:
__shared__ Foo foo_array[];
do_something(foo_array[threadIdx.x].member_1);
do_something_else(foo_array[threadIdx.x].member_2);
// ...
such that each thread more or less exclusively work on one elements.
Is it reasonable to assume under SIMT model, that members of Foo
would be access simultaneously by all the threads in a warp?
If alignof(Foo) == 16
. Is the best one can hope for is consecutive element address be offset by 16 bytes or 4 banks. So access any member would at least have a 4 way bank conflict?
Thank ahead for any answer.