Hello,
I saw in another thread a few months ago that the new llvm nvfortran is being used to test compiling CUDA Fortran code, and I was wondering what the timeline will be for it to be usable publicly (in the sense of officially being in the HPC toolkit). Also, is there a plan to include the CUDA C/C++ features that aren’t in nvfortran once the swap is done? Everything from more native tensor core WMMA usage to certain async prefetch stuff, etc.
In particular, I am hoping that in the short future I can use coarrays + CUDA Fortran for my significant projects (noting that the only current coarray implementation whose performance rivals MPI is Intel’s, and with significantly lower installation/compilation difficulties than competitors). In 2025, this paper came out illustrating one can use Intel coarray Fortran + CUDA (link) but it is an agonizing setup and requires bind(C) to communicate between the two compilers. That’s ugly and obfuscates the code, and any good HPC code should try and be as clear as possible.
PS: Is there a plan to involve coarrays at the MPI and OpenMP level, or just MPI? If both are implemented, ideally it could be done such that the teams can specify which coarrays are communicating intra-node and which are communicating between nodes. Perhaps this is too much to ask for, but it is certainly the spirit of the CUDA design.