Hi WENYANG LIU,
Scalars are privatized so the ‘final’ used in the kernel is different from the host’s copy of ‘final’, hence why you’re printing out seemingly random values (you’re really printing out uninitialized memory).
The work-around is to make final a single element array:
xps730:/tmp/qa% cat seq.f90
integer :: i, final(1)
!$acc do seq
!$acc end region
print *, final(1)
% pgf90 -ta=nvidia -Minfo seq.f90
5, Generating copyout(final(1))
Generating compute capability 1.0 binary
Generating compute capability 1.3 binary
Generating compute capability 2.0 binary
7, Parallelization would require privatization of array 'final(1)'
Accelerator kernel generated
7, !$acc do seq
CC 1.0 : 2 registers; 0 shared, 8 constant, 0 local memory bytes; 33% occupancy
CC 1.3 : 2 registers; 0 shared, 8 constant, 0 local memory bytes; 25% occupancy
CC 2.0 : 4 registers; 0 shared, 40 constant, 0 local memory bytes; 16% occupancy
While this ‘works’, you generally only want to use sequential code within a parallel region. Running purely sequential code on a GPU will be quite slow.