column-based, row-based between CUDA C and CUDA Fortran

This seems to be a trivial question yet I’m not clear and I hope someone can confirm it for me.

Fortran is column-based and C is row-based. In CUDA, it’s critical for the data on device memory (or shared memory) to be consecutive for the threads to access at optimal performance. Is it correct that CUDA Fortran call CUDA-C as the back-end to run the kernel? If so, does the compiler automatically manage the conversion to row-based for better performance.

Now, if I have an array data declared in device memory using CUDA Fortran, and will call a C function (via C-Fortran interoperability) to process the data. I’m not sure whether this data on device memory is row-based or column-based. If the data is column-based, does the data be automatically converted to row-based before passing to the C function, or it’s something the programmer need to handle.

Thanks,
Tuan

Hi Tuan,

Is it correct that CUDA Fortran call CUDA-C as the back-end to run the kernel?

While we do use some of the CUDA C backend tools to generate device code, the compiler will make no attempt to reorganize your data so that it’s row major. You need to optimize your kernels so that thread blocks access memory by column.

  • Mat

Hi Mat,
May I ask another question to make it clear.

I believe that your answer means Fortran generate the array on device memory with column-based.

If so, then if I call a compiled C kernel, from the Fortran code, to process this data. I need to write the C kernel so that it access the data in column-based?

Thanks,
Tuan

If so, then if I call a compiled C kernel, from the Fortran code, to process this data. I need to write the C kernel so that it access the data in column-based?

Correct. No different than normal Fortran calling normal C.

  • Mat

Thanks, Mat.

Tuan