CUDA C and CUDA Fortran device memory compatibility

torkyahmad · July 12, 2019, 1:11pm

Hello,
I hope you can give me information on the following if you would be so kind:
I have CUDA C code that calls F90 subroutines and also Fortran produced executables, however, I always transfer data between codes through external text files (although I am aware of ISO_C_BINDING). I hope to have more compatibility between my codes especially when there is data computed by the GPU for both languages. Is it possible for CUDA Fortran and CUDA C to interact with the same device memory together in the same instance (sequentially I mean)?
Are there any means of keeping device memory on the GPU that is produced by CUDA C code and then making use of it from CUDA Fortran device kernels? I hope to minimize data transfer as much as possible without rewriting my code in one language and without transferring data back to the host.
Please tell me if there are workarounds to such an attempt.
Thank you for your time.
Ahmed

MatColgrove · July 12, 2019, 3:04pm

Is it possible for CUDA Fortran and CUDA C to interact with the same device memory together in the same instance (sequentially I mean)? Are there any means of keeping device memory on the GPU that is produced by CUDA C code and then making use of it from CUDA Fortran device kernels?

Sure, though I’m not sure what you mean by “sequentially”, and assuming you mean that the CUDA C and CUDA Fortran are in the same binary and you want to pass the CUDA C device pointer to a CUDA Fortran host routine to be used later in a global device kernel.

Just pass the CUDA C device pointer as an argument to the Fortran subroutine as a F77 style array with a device attribute. Normal C to Fortran calling issues (such as name mangling) would apply.

Something like CUDA C side:

double * ptr;
int args;
cudaMalloc(&ptr,size);
.. set ptr memory on the devce
fortran_routine_(ptr, args);   // add the under-bar for Fortran name mangling

CUDA Fortran side:

subroutine fortran_routine(A,args)
    real(8), device :: A(*)
    integer :: args
....
    call cuf_kernel<<grid,blocks>>(A)
...
end subroutine

-Mat

torkyahmad · July 14, 2019, 6:37am

Hello Mat,
Thank you dearly for your reply and advice.
Can I just ask you one more thing about this matter?
The code is currently compiled with Intel C++ Compiler. Is it possible for me to compile CUDA Fortran subroutines with PGICE Compiler and call them from the Intel Compiler?
Would I pass “.obj” files to the Intel Compiler or “.dll/.lib” files? I hope you can tell me how I can make this happen, please?
Also, if instead of compiling with Intel, could I attempt to compile my whole code that contains CUDA C and Fortran routines with a PGICE Compiler 19.4 from PGI Cmd. Do you know if I would run into some trouble between how the code is compiled between both compilers? I would prefer it if everything compiles on PGI, but I am afraid of intrinsics that might be only local to Intel…
If there are any sample codes that you can offer for the explanation, I would really appreciate it!
Thank you for always being supportive!
Ahmed

MatColgrove · July 15, 2019, 7:33pm

The code is currently compiled with Intel C++ Compiler. Is it possible for me to compile CUDA Fortran subroutines with PGICE Compiler and call them from the Intel Compiler?

Sure.

Though, if you aren’t linking with PGI, you may need to compile with “-Mcuda=nordc”. Relocatable Device Code (RDC) requires a link step for the device objects, which won’t be available if you’re linking with another compiler. The caveat is that RDC is required to access module device data from external modules or calling device routines found in external modules. If you use either of these features, you’ll need to link with the PGI driver.

Would I pass “.obj” files to the Intel Compiler or “.dll/.lib” files? I hope you can tell me how I can make this happen, please?

It would be easier to link directly with the object files, but you can link against a dyamic or static library as well. Up to you on how you want to package things. DLLs can be a bit more involved due to needing to export you symbols and include a DLLmain to initialize the PGI runtime, but should work fine.

Full details on creating Static or Dynamic Libraries on Windows can be found at: PGI Compiler User's Guide Version 19.5 for x86 and NVIDIA Processors

Also, if instead of compiling with Intel, could I attempt to compile my whole code that contains CUDA C and Fortran routines with a PGICE Compiler 19.4 from PGI Cmd.

Unfortunately, not on Windows since we don’t support C++ and nvcc needs a C++ compiler as it host compiler. On Linux, you could use pgc++ as the host compiler to nvcc, just not on Windows.

Do you know if I would run into some trouble between how the code is compiled between both compilers?

Possibly, but most likely you’ll be able to work through the issues.

If there are any sample codes that you can offer for the explanation, I would really appreciate it!

I don’t have an example off-hand for this exact scenario. But if you go the object route (the easiest), then you should only need to worry about traditional C++ to Fortran Calling issues, compiling the CUDA Fotran code with “-Mcuda=nordc”, and then adding the PGI runtime libraries to the link (or the Intel runtime libraries if you link with PGI).

-Mat