I was wondering if I could get some clarification about something.
I have the following in my Fortran code:
mask is a logical array.
When using OpenACC, in order to parallelize this on the GPU I use:
!$acc kernels default(present) min_field_diff_local=MINVAL(field_ratio,mask) !$acc end kernels
And this works, with the compiler (22.7) saying:
41402, Generating default present(mask(:,:,:),field_ratio(:,:,:)) 41403, Loop is parallelizable Generating NVIDIA GPU code 41403, ! blockidx%x threadidx%x auto-collapsed !$acc loop gang, vector(128) collapse(3) ! blockidx%x threadidx%x Generating implicit reduction(min:field_ratio$r)
I am in the process of converting the code to use standard parallelism as much as possible, so I have first been removing all
kernels that I can.
My question is that if I compile with
-stdpar=gpu, will the
MINVAL automatically be compiled to the GPU?
If yes, if I use
-nomanaged (manual data movement with OpenACC), would I need to do something like:
!$acc host_data use_device(field_ratio,mask) min_field_diff_local=MINVAL(field_ratio,mask) !$acc end host_data
A last question would be that what happens if I have a
MINVAL call that I want on the CPU (during initialization) on data that may or may not have a copy on the GPU, would the compiler transfer to and from the GPU with managed memory? What if I am using
-nomanaged? Would it run on the CPU if there is no GPU version of the arrays, or would it try to run on the GPU?
It would seem to me that with managed memory, it should compute on the GPU even if that means slow transfers, but if
-nomanaged is being used, it should only run on the GPU if contained in a
host_data region? Or is that mixing apples and oranges? (stdpar vs openacc)?