Fortran MINVAL/MAXVAL with stdpar

caplanr · August 26, 2022, 8:17pm

Hello,

I was wondering if I could get some clarification about something.

I have the following in my Fortran code:

min_field_diff_local=MINVAL(field_ratio,mask)

where mask is a logical array.

When using OpenACC, in order to parallelize this on the GPU I use:

!$acc kernels default(present)
      min_field_diff_local=MINVAL(field_ratio,mask)
!$acc end kernels

And this works, with the compiler (22.7) saying:

  41402, Generating default present(mask(:,:,:),field_ratio(:,:,:))
  41403, Loop is parallelizable
         Generating NVIDIA GPU code
      41403,   ! blockidx%x threadidx%x auto-collapsed
             !$acc loop gang, vector(128) collapse(3) ! blockidx%x threadidx%x
             Generating implicit reduction(min:field_ratio$r)

I am in the process of converting the code to use standard parallelism as much as possible, so I have first been removing all kernels that I can.

My question is that if I compile with -stdpar=gpu, will the MINVAL automatically be compiled to the GPU?

If yes, if I use -nomanaged (manual data movement with OpenACC), would I need to do something like:

!$acc host_data use_device(field_ratio,mask)
      min_field_diff_local=MINVAL(field_ratio,mask)
!$acc end host_data

?

A last question would be that what happens if I have a MINVAL call that I want on the CPU (during initialization) on data that may or may not have a copy on the GPU, would the compiler transfer to and from the GPU with managed memory? What if I am using -nomanaged? Would it run on the CPU if there is no GPU version of the arrays, or would it try to run on the GPU?

It would seem to me that with managed memory, it should compute on the GPU even if that means slow transfers, but if -nomanaged is being used, it should only run on the GPU if contained in a host_data region? Or is that mixing apples and oranges? (stdpar vs openacc)?

– Ron

bleback · August 26, 2022, 9:29pm

It will not run on the GPU without acc kernels around it. We do not automatically offload any F90-style array intrinsics. We are considering some changes to that though, in the future.

Right now, acc kernels works because the code for doing the minval gets inlined into our compiler early, and then the normal flow just works.

Also, using CUDA Fortran, we have device functions (actual overloaded function calls) for many functions, including minval. But that requires that the compiler recognizes the arrays have either the CUF managed or device attribute. And requires you do “use cudafor” in the program unit.

I’ve been working on many more functions, sort of what we did for matmul, transpose, and reshape, described in a blog I wrote a couple of years ago. Probably the next step is to just always enable those, and “do the right thing” based on the HW and whether the data can be accessed from the GPU or not. We still need a way for the programmer to override the compiler’s default decision, addressing your last two paragraphs.

Topic		Replies	Views
Combining stdpar with OpenACC async nvc, nvc++ and nvfortran	1	419	April 27, 2023
Fortran OpenACC fallback to OpenMP if there is no GPU nvc, nvc++ and nvfortran	3	724	November 2, 2020
Nvfortran -stdpar triggers OpenACC directives to be evaluated nvc, nvc++ and nvfortran	1	14	November 22, 2024
Hybrid runs on CPU and GPU - OpenACC nvc, nvc++ and nvfortran openmpi	6	1522	May 23, 2022
OpenACC pointer procedure (fortran) nvc, nvc++ and nvfortran	2	32	February 18, 2025
About -stdpar=gpu -acc=gpu -gpu=nomanaged nvc, nvc++ and nvfortran	4	39	February 21, 2025
OpenACC routine behavior nvfortran nvc, nvc++ and nvfortran	4	22	April 11, 2025
OpenACC: cuStreamSynchronize crash when using pointers as parameters nvc, nvc++ and nvfortran	4	822	December 7, 2021
Openacc routine directive nvc, nvc++ and nvfortran	3	649	March 27, 2024
Just released: HPC SDK 24.9 nvc, nvc++ and nvfortran	9	111	October 8, 2024

Fortran MINVAL/MAXVAL with stdpar

Related topics