and in the section ‘Custom Structures’ is says that
‘SoA (structure of arrays) is the preferable approach for many cases for data-parallel computations because it groups related data into a contigous array.’
Can this be true? In SoA all elements of a struct get their own array, so this shouldn’t be groupt contigously. Is this a typo and should say AoS???
Arrays of structures can be efficient, as long as the structure is 32, 64, or 128 bytes because structures of this size can be coalesced. For anything larger, use SoA or you will cry (performance penalty for non-coalesced accesses is a factor of 10-20).
You have 3 separate memory writes spanning the contiguous bytes of the struct so none of them are coalesced.
Look, you don’t have to just believe me OK. Write a micro benchmark and measure the bandwidth. Run it through the profiler (assuming you are on a machine that supports the profiler counters) and see what it has to say about incoherent loads/writes.
Or just look it up in the programming guide, it only takes 10s
I assumed that compiler will do some magic to make these reads coalesced. But I guess the magic has to be done by the developer (say via shared memory) since, as a quick test revealed, the compiler is unable to make these reads coalesced: the bandwidth is just 16GB/s compared to max 80GB/s from coalesced reads.