Fortran + C + CUDA CUDA Remapping Fortran arrays to C fashion?

kriztho · January 8, 2010, 2:02am

Hello,

I have been working on a fortran code and I was forced to use c in order to use cuda programming for I don’t have access to CUDA Fortran which is not for free by PGI whatsoever.

However, I’ve been having problems and questions for the fortran + C bundle.

I decided to use 1D arrays on C and CUDA for avoiding pointers arrays which generally goes to a less efficient, less secure programming.

However, most of my data structures are declared on the Fortran side and I’m using extern on the C code. Thus I think the data would still be in fortran fashion (column-major order), but my question is mainly on the cudaMemcpy function.

When I do this, is the data remapped because CUDA C compiler would rather do row-major order? If this is the case, it doesn’t really make a difference that I’m using 1D for everything since the data will not be contiguous. Should I be using 2D, 3D and 4D instead for keeping at least the same structure? (I have 4D arrays).

Does anyone know how efficient is CUDA with higher order arrays?

And, what should be using for transferring 4D arrays to Global Memory and keeping the same structure to the data?

Cristhopper

avidday · January 8, 2010, 8:51am

You can use column major order storage (FORTRAN ordered) easily in CUDA. In some ways it is a little more natural than row major ordering because the natural CUDA block and grid ordering scheme is actually column major ordered, so there is a direct affinity between block and grid indices and the equivalent indices in column major ordered arrays. The only difference is zero versus one starting indices, but that is only a semantic difference, it has no real effect on anything. cudaMemcpy() just copies blocks of one-dimensional linear memory, so it works irrespective of which order the data uses. There is no “remapping” or “transformation” of data at any stage.

CUBLAS, the BLAS for CUDA, also uses column major ordered storage and is effectively a complete working example of what you are trying to achieve - CUDA device code, written in C, and working internally using column major ordered storage.

kriztho · January 11, 2010, 6:25pm

You can use column major order storage (FORTRAN ordered) easily in CUDA. In some ways it is a little more natural than row major ordering because the natural CUDA block and grid ordering scheme is actually column major ordered, so there is a direct affinity between block and grid indices and the equivalent indices in column major ordered arrays. The only difference is zero versus one starting indices, but that is only a semantic difference, it has no real effect on anything. cudaMemcpy() just copies blocks of one-dimensional linear memory, so it works irrespective of which order the data uses. There is no “remapping” or “transformation” of data at any stage.

CUBLAS, the BLAS for CUDA, also uses column major ordered storage and is effectively a complete working example of what you are trying to achieve - CUDA device code, written in C, and working internally using column major ordered storage.

Hello avidday, thanks for your answer.

Doublechecking, so you’re saying that cudaMemcpy would only copy bytes as such, without caring anything about the data structure; therefore, it would do sequential copying (1D) regardless of how the memory has been set, in this case, since the array was declared on Fortran and the memory has a column-major ordering, this would be the way itll be done, right?

Another question is: what should I be using for higher order arrays like 4D? can I do 4D arrays in shared memory? is texture memory strictly limited for 3D?

Best,

Cristhopper

avidday · January 11, 2010, 7:52pm

Yes.

I would recommend using appropriately pitched linear memory in every case. Integer indexing calculations in higher dimensions are simple and computationally inexpensive relative to the alternatives (like multiple levels of pointer indirection, for example). As for textures, the texture/filtering hardware only does up to 3D AFAIK (certainly the API only exposes 1D, 2D and 3D), so you are limited in what you can do there. It should be possible in some cases to use 1D textures for higher dimensional lookups if you design the layout carefully, but that can be limited in its usefulness. The GPU memory controller handles float2 and float4 reads quite well, so that offers another possibility for tensor quantities (although there are not so many native float2 and float4 arithmetic primitives as for the scalar floating point types).

Does that help make things clearer?

Sarnath · January 12, 2010, 6:55am

cudaMemcpy does NOT change the ordering… It it is 1D, the data will be available as the same 1D array for your kernel.

Note that FORTRAN indexing starts from 1, C indexing starts from 0. All your FoRTRAN variables that are “array indices” need to be subtracted with 1 before using them to access…

We are using it in our work. We have absolutely no problems.

nitin.life · January 12, 2010, 8:10am

Same here… it works smooth

@sarnath are you using ISO_Cbinding feature in new Fortran 2003 ? if yes… are you using pinned memory and streams ?

Sarnath · January 12, 2010, 8:48am

We do use ISO C Binding. We don’t use pinned memory or streams. But we consider using pinned memory in a short while.

Do you think, there is an issue out there with pinned memory? I plan to use it only in C-CUDA portion — no idea of interfacing it with Fortran.

My interfaces are like this: FORTRAN calls a C interface – which in turn calls a CUDA interface… FORTRAN does NOT see CUDA - just an info if that helps…

Topic		Replies	Views
CUDA + FORTRAN ? CUDA Programming and Performance	7	27699	February 1, 2009
How to pass variables to different kernal functions via global variables? CUDA Programming and Performance	8	3015	June 9, 2010
Global arrays? CUDA Programming and Performance	24	10614	August 18, 2010
Using C-pointer for 3D Fortran array Legacy PGI Compilers	4	5387	February 24, 2014
Avoiding a device write using textures and arrays. CUDA Programming and Performance	3	2794	August 7, 2008
An Easy Introduction to CUDA Fortran Technical Blog	7	566	June 21, 2024
Fortran and cuFFT CUDA Programming and Performance	8	17827	September 19, 2009
Calling CUDA C from fortran CUDA Programming and Performance	4	877	December 4, 2021
CUDA Fortran and CUDA API 3D Arrays Legacy PGI Compilers	4	7688	October 5, 2011
Host to device memory zero copying CUDA pointer couldn't see host memory defined in Fortran CUDA Programming and Performance	9	10018	July 20, 2010

Fortran + C + CUDA CUDA Remapping Fortran arrays to C fashion?

Related topics