Accelerator Information Questions

Hi,

I am new to accelerator programming and am attempting to add accelerator directives around some do loops in Fortran code. When I compile with the -Minfo option I get the following debug and am not sure what to do to fix it.
Statement 255 and 390 are not in the acccelerated section.

255, Invariant assignments hoisted out of loop
283, Generating copyin(radiance(:,1,1,1:64))
Generating copyout(model_radiance(1:nwl,1:64))
Generating compute capability 1.0 binary
Generating compute capability 1.3 binary
284, Accelerator restriction: scalar variable live-out from loop: wl
Sequential loop scheduled on host
285, Loop is parallelizable
Accelerator kernel generated
285, !$acc do parallel, vector(32)
Non-stride-1 accesses for array ‘model_radiance’
CC 1.0 : 4 registers; 24 shared, 72 constant, 0 local memory bytes; 33 occupancy
CC 1.3 : 4 registers; 24 shared, 72 constant, 0 local memory bytes; 25 occupancy
290, Accelerator restriction: scalar variable live-out from loop: wl
Sequential loop scheduled on host
294, Loop is parallelizable
Accelerator kernel generated
294, !$acc do parallel, vector(32)
Non-stride-1 accesses for array ‘model_radiance’
Non-stride-1 accesses for array ‘radiance’
CC 1.0 : 6 registers; 28 shared, 72 constant, 0 local memory bytes; 33 occupancy
CC 1.3 : 6 registers; 28 shared, 72 constant, 0 local memory bytes; 25 occupancy
299, Loop is parallelizable
Accelerator kernel generated
299, !$acc do parallel, vector(32)
Non-stride-1 accesses for array ‘model_radiance’
CC 1.0 : 11 registers; 24 shared, 112 constant, 0 local memory bytes; 33 occupancy
CC 1.3 : 11 registers; 24 shared, 112 constant, 0 local memory bytes; 25 occupancy
390, Invariant assignments hoisted out of loop

Hi Pebbles,

I’m assuming that you mean the following lines are the problem. The rest of the output looks fine.

284, Accelerator restriction: scalar variable live-out from loop: wl
290, Accelerator restriction: scalar variable live-out from loop: wl

These mean that the scalar variable wl is being set within an accelerator region and is then used later on the host. The problem being if each GPU thread has it’s own copy of wl, which thread’s wl should be used for the host? To fix, either remove the use of ‘wl’ on the host, use the ‘private’ clause to have the compiler ignore this live-out, or or set wl to a value (like zero) after the accelerator region,

Non-stride-1 accesses for array ‘model_radiance’
Non-stride-1 accesses for array ‘radiance’

These mean that your data not being accessed sequentially causing memory divergence within a warp. This will cause your performance to suffer.

To fix, you may need to restructure your data so that the column dimension in Fortran or the row dimension in C can be the vector.

  • Mat