Stdpar, OpenACC, and Unified Memory

I have a code that has both “DO CONCURRENT” loops and OpenACC data directives and reductions. I compile the code with the flags: -stdpar=gpu and -acc=gpu. I wanted to find out when I use OpenACC data directives, are the data movement directives used, or does the unified memory turned on by -stdpar overwrite all data movement directives?

Correct, “-stdpar” will enable CUDA Unified Memory so the any allocated data used in the OpenACC data directives will essentially be ignored.

However, you can disable using UM via the “-gpu=nomanged” flag and the OpenACC data directives will be used.

Hope this helps,

I was under the impression that UM does not work for static arrays (only allocatable).

Is this still true?

If so, are the ACC data directives for static arrays still ignored with the implicit use of UM with stdpar?

Since UM is not part of the OpenACC spec, but rather an nvfortran-specific feature, it would seem to me that the “default” behavior, when -acc=gpu is specified, should be to use ACC data directives if they are there, and use UM only if they are not?
Or is this back to the unresolved issue of being able to use UM for some arrays and explicit data movement for others? Is UM still all-or-nothing?

  • Ron

Correct and why I said ‘any allocated data’. Static data still needs to be copied either implicitly by the runtime or explicitly via data directives.

If so, are the ACC data directives for static arrays still ignored with the implicit use of UM with stdpar?

This is why I said “essentially”. The data directives are not really ignored, but rather UM managed data is ‘present’ so no device allocation or copy is performed.

Is UM still all-or-nothing?

Basically the compiler replaces the underlying memory allocation call to use cudaMallocManaged. So you could separate the allocation into separate files and compile one with and one without -gpu=managed so that only some are managed, but this is a bit cumbersome. Instead it’s better to use the CUDA Fortran “managed” attribute if you need more precise control over which arrays use UM.



So if I understand correctly:

If I use a “acc enter/exit data” on a static array, than that data directive will be used whether UM is on or off, but if I have “acc enter/exit data” on an allocatable array, than those directives are essentially a “no-op” when UM is on.

If this is correct, does this interfere with using an “if_present” conditional on an acc directive? Since without UM, the data would not be present until it is manually put on the device, but with UM it is seen as present? I suppose it would just do “the right thing” and always have a true “if_present”?

  • Ron