I would like to find a refererence to a definitive statement regarding the “correct” method to place cells into 2D arrays when using CUDA. Also, how are the blocks and threads arranged?
I know that C uses the row-major placement. FORTRAN and Matlab use column-major placement.
I have seen “hints” in these forums, though, that the GPU blocks and threads are arranged and should be loaded as row-major placement.
I can’t find a clear statement in the CUDA documentation I’ve read. Could be there, but I haven’t seen it.
Also, the new book by Kirk & Hwu is unclear on this concept (IMHO).
If you feel like you can provide an answer to this question and want to use small 2D arrays as examples please use a non-square array format so it will be extremely clear which dimension is which.