Roadmap for CUDA Fortran?

I know this is a vague question, but I’m happy with whatever details can be shared. I have found CUDA Fortran to be my favorite way of using CUDA, and there’s no AMD equivalent which makes it unique in the space of HPC on GPUs.

Is there any roadmap at all, or is it just bug fixes and CUDA library support for the foreseeable future? [if so, no need to answer anything below]

  1. Is there a roadmap for Fortran2008+ Standard compatibility?
    — Specifically, I am thinking of coarrays+CUDA. This could further simplify multiprocessor computing with CUDA, and I think would boost nvfortran popularity amongst the Modern Fortran community.
    — I have seen some posts on here in the past of developers commenting they are waiting on Flang or other such LLVM Fortran compilers to be more complete before tackling further Standard compatibility. Is that still the status?
  2. I notice that nvfortran sometimes lags some in CUDA features first implemented in nvcc. Are there CUDA features that aren’t planned to be implemented in CUDA Fortran?
  3. Anything else relevant to the question?
1 Like

Hi adench2,

Sorry for the late reply. I accidently missed your post.

Basically, we just add to CUDA features to Fortran as they are added to CUDA C/C++ but are not C++ specific. Yes, there is a bit of a lag, but eventually many items are added depending on the relevancy to Fortran and usefulness to HPC based applications. Though I don’t have any specific items that are being worked on right now.

Is there a particular new feature in CUDA that you’re looking for?

The use of coarrays and CUDA is an interesting topic but it’s still too early to know how they will fit together. First we need to add coarray support in the host compiler which is being done in conjunction with our Flang community effort. Once complete, we’ll then revisit what coarrays means in CUDA.


1 Like

Hi Mat,

Thanks a lot for the reply! It was a vague question, I didn’t expect a fast answer either way.

I was interested in the CUDA 11.0 cuda::memcpy_async function for asynchronous global->shared transfers and its associated synchronization functions, and I don’t see its usage mentioned anywhere in the CUDA FORTRAN programming guide online here. It appears to be part of libcu++ - I am not sure what that means for this functionality in CUDA Fortran.

I’ll eagerly anticipate the future of this then.

1 Like

“cuda::memcpy_async” is a C++ template so couldn’t be used directly, but I’ll pass this request on to see if there’s way to mimic the behavior in Fortran.

Thanks! It seems like it would be quite useful for large scale HPC applications.

In the nvfortran wmma module, we have some support for CUDA 11 pipelineMemcpyAsync(), piplineCommit(), and pipelineWaitPrior(). But, unfortunately, it kind of languished without a good use case. I’m not sure we even documented it. With upcoming CUDA 12 features, we’ll revisit that, and I think you can expect at least support for thread block clusters (or arrays of thread blocks) and the means to program them, using async transfers, from CUDA Fortran. It might require a revamp of our cooperative groups module.

1 Like

Please, please, please could you guys and gals consider bringing Coarrays into the nvfortran framework. This is the one piece of compatibility that is stopping me using nvfortran exclusively. Coarrays are so powerful as a system, as a concept. Even if you had something similar, an array structure across memory, that could be accessed by all the parallel structures. I mean, think about it. Imagine a system of processes that were interactive between eachother, in fortran! And we’ve had that for years and it’s called coarrays and, at least for me, this functionality seems obvious and I shouldn’t need to list all the world changing benefits dynamically collaborative parallel processes makes in other languages.

Look, the nvfortran compiler is brilliant, robust and so easy to use. I know my voice is just one, and many voices and hands would be needed to turn the dial on bringing in this feature, but I feel really passionately about both nvfortran and coarrays, and if they could both be combined, I know there could be so much that could be done on a cup/gpu system.

1 Like

Co-array support is expected to be added as part of the Flang F18 project which we’ve partnered with the LLVM community to develop. llvm-project/flang at main · llvm/llvm-project · GitHub

Once F18 is stable enough, we’ll likely switch nvfortran to use this front-end, though no timeline on when that will occur.

1 Like