I have a question regarding the decision of the compiler to generate an accelerator kernel.
Can it be that identical code
!KERNEL FOR TEST PURPOSE
n = 100000
allocate(a(n), r(n), e(n))
!$acc kernels loop
do i = 1,n
r(i) = a(i) * 2.0
enddo
do i = 1,n
e(i) = a(i) * 2.0
enddo
leads in some cases to a accelerator kernel and in some cases not?
remorg:
85, Loop unrolled 16 times
Generated 2 prefetches in scalar loop
89, Loop unrolled 16 times
Generated 2 prefetches in scalar loop
134, Loop not vectorized/parallelized: contains call
lwtt:
110, Loop unrolled 2 times
123, Generating copyout(r(1:100000))
Generating copyin(a(1:100000))
124, Loop is parallelizable
Accelerator kernel generated
Generating Tesla code
124, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
128, Loop unrolled 16 times
Generated 2 prefetches in scalar loop
145, Outer loop unrolled 5 times (completely unrolled)
147, Loop unrolled 2 times