I have structure of array in global memory:
ttt,…ttt; aaa,…,aaa; fff,…,fff;
I can sort the threads to make them access the array in a coalesced way.
For some cases, the cost of sorting grow too large and I won’t sort the threads. Then the threads access the array with a random id.
Now, I’m wondering whether array of structure (taf,taf,…,taf) wins over structure of array, because each thread access t,a,f in a chain.
With sorted threads, accessing SoA is coalesced because one thread accesses adjacent data for other threads?
With random access, say, thread 0 accesses t12,a12,f12, thread 1 accesses t39,a39,f39… When thread 0 accesses t12, will it fetch a12 and f12? Or it just fetches t11, t13 etc and t11, t13 are just wasted?
Thanks for your reply. Your observation makes sense on CPU.
SoA is usually preferred on GPU because one thread may copy data to cache for other threads.
AoS is better on CPU when I use all members of the structure because one step may copy data to cache for later steps.
What I’m confused about is whether the latter case, namely caching for later use, holds for GPU or cache is not large enough and is flushed every step by not a few threads.