generating accelerator kernel

I have a question regarding the decision of the compiler to generate an accelerator kernel.
Can it be that identical code

!KERNEL FOR TEST PURPOSE
      n = 100000
      allocate(a(n), r(n), e(n))
      !$acc kernels loop
        do i = 1,n
            r(i) = a(i) * 2.0 
        enddo

        do i = 1,n
            e(i) = a(i) * 2.0
        enddo

leads in some cases to a accelerator kernel and in some cases not?

remorg:
85, Loop unrolled 16 times
Generated 2 prefetches in scalar loop
89, Loop unrolled 16 times
Generated 2 prefetches in scalar loop
134, Loop not vectorized/parallelized: contains call
lwtt:
110, Loop unrolled 2 times
123, Generating copyout(r(1:100000))
Generating copyin(a(1:100000))
124, Loop is parallelizable
Accelerator kernel generated
Generating Tesla code
124, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
128, Loop unrolled 16 times
Generated 2 prefetches in scalar loop
145, Outer loop unrolled 5 times (completely unrolled)
147, Loop unrolled 2 times

Hi hendrun,

Can it be that identical code leads in some cases to a accelerator kernel and in some cases not?

For identical code, using the same tool chain, it would generate the same binary.

Note that the compiler feedback messages you show are for both the Accelerator device and the host code. By default we generate two versions of the code, one for the device and one for the host. At run time if an accelerator is available, then the code will run on the accelerator. If not, then the host version is run.

If you only want to view the compiler feedback for OpenACC, use “-Minfo=accel”.

If you want to generate a device only version (i.e. no host version), use the “-ta” flag to specific which device to target. For example: “-ta=tesla” to only target a Tesla GPU. (FYI the default is “-ta=host,tesla”).

Did this help clarify things?

  • Mat