Has any one ever tested the performance when running GPU application between using logical 3D data array Arr(A,B,C) vs. mapping the data to 1D array Arr(ABC).
I’m curious to see if the performance is that much difference that we should consider using 1D array, as using 3D is more intuitive from programming side.
My main concern is that in 1D representation we can pad data in the front easily to make sure data alignment, i.e. N elements with padding of size ‘padding’
then we pass A_dev(padding:) to the kernel.
However, in 3D representation, we cannot do such padding, right?
While I haven’t studied it specifically, since the compiler will transform a 3D array to a 1D array on the device, my assumption is that there would be little difference in performance.
You are correct in that padding can make a small difference in kernel performance due to caches. However, in my experience this is very minor. Of course, your code may benefit from it so you should experiment.