I have been working on a fortran code and I was forced to use c in order to use cuda programming for I don’t have access to CUDA Fortran which is not for free by PGI whatsoever.
However, I’ve been having problems and questions for the fortran + C bundle.
I decided to use 1D arrays on C and CUDA for avoiding pointers arrays which generally goes to a less efficient, less secure programming.
However, most of my data structures are declared on the Fortran side and I’m using extern on the C code. Thus I think the data would still be in fortran fashion (column-major order), but my question is mainly on the cudaMemcpy function.
When I do this, is the data remapped because CUDA C compiler would rather do row-major order? If this is the case, it doesn’t really make a difference that I’m using 1D for everything since the data will not be contiguous. Should I be using 2D, 3D and 4D instead for keeping at least the same structure? (I have 4D arrays).
Does anyone know how efficient is CUDA with higher order arrays?
And, what should be using for transferring 4D arrays to Global Memory and keeping the same structure to the data?