I’m new in this forum so Hello to everybody:)
I have (maybe stupid) question. In my kernel I would like to work with some points data (float3 with x,y,z components)
My question is about reading data from Global memory to shared. From CUDA Best Programming Guide I know that GPU reading 128 byte word for one warp transaction. That’s mean, each warp(32 threads) can easy read any 4 byte data type in coalesced way during one cycle.
There are many of examples in the internet with this type of data and I get it. It’s easy :)
But what’s happen if I want to read in each thread float3 or float4 data type?
SM will call more 128 byte word transaction operation to satisfy whole data to be copied ?
float3 ( 3 x 4 byte = 12 byte)
32 threads x 12 byte = 384B
one transaction 128B so SM will run 3 transactions cycles in warp?
Or instead of storing data in AoS style is better to store my point data in SoA?
If my question is stupid, just sorry:)