NVHPC 23.9 vectorization issue

Hello,

I realize this is not the most recent version of the NVHPC Toolkit, but it is the latest version supported by the NERSC Perlmutter cluster.

I have this simple program that does a selection sort over 4 elements:

program main
    implicit none

    real(8)  :: e(4), etmp
    integer :: i, j, k

    e(1) = 2.000000000000000 
    e(2) = 3.000000000000000
    e(3) = 1.000000000000000
    e(4) = 2.000000000000000

    ! Initial array is unsorted.
    write(*,'(1x,a,2x,4(f16.13,1x))') 'e_before_sort=',e(1),e(2),e(3),e(4)

    ! Sort the 4-element array e.
    do i=1, 3
        k = i
        do j=i+1, 4
            if(e(k) > e(j)) then
                k = j
            end if
        end do
        if (i /= k) then
            !switch e(i) and e(k)
            etmp = e(i)
            e(i) = e(k)
            e(k) = etmp
        end if
    end do

    ! Are they sorted?
    write(*,'(1x,a,2x,4(f16.13,1x))') 'e_after_sort=',e(1),e(2),e(3),e(4)

    do i=2,4
        if (e(i-1) > e(i)) then
            write(*,*) 'FAIL:  out of order at index ', i
        end if
    end do
end program main

If I compile the program like this, it behaves correctly (note: vectorization disabled):

$ nvfortran main.f90 -o main -fast -Mnovect
$ ./main
e_before_sort=   2.0000000000000  3.0000000000000  1.0000000000000  2.0000000000000
e_after_sort=   1.0000000000000  2.0000000000000  2.0000000000000  3.0000000000000

However, if I compile the program like this (vectorization enabled), it fails:

$ nvfortran main.f90 -o main -fast
$ ./main
e_before_sort=   2.0000000000000  3.0000000000000  1.0000000000000  2.0000000000000
e_after_sort=   2.0000000000000  1.0000000000000  2.0000000000000  3.0000000000000
FAIL:  out of order at index             2

I am mainly reporting this as a potential bug, but also if you have any suggestions then I would be grateful to hear them.

Thanks for the report and reproducing example! I’ve filed a problem report, TPR #35668, and sent it to engineering for investigation.

Is there some directive I can put in the code to prevent vectorization of just that loop?

I have been looking high and low for documentation of what directives nvfortran supports, and have found almost no information beyond OpenMP/OpenACC directives.

Thanks,
-Donnie

Yes, they stopped documenting these when we rebranded PGI to NVHPC, though you can still find the PGI directives at: PGI Compiler Reference Manual Version 20.4 for x86 and NVIDIA Processors

Note that we’re in the process of rewriting our Fortran front-end as part of the LLVM community’s Flang F18 project so these will go away eventually. Also, I did test your example with our development build of flang and it worked correctly.

Is there some directive I can put in the code to prevent vectorization of just that loop?

Use “novect”, for example:

    ! Sort the 4-element array e.
    do i=1, 3
        k = i
!pgi$l novect
        do j=i+1, 4
            if(e(k) > e(j)) then
                k = j
            end if
% nvfortran -fast test.F90 -Minfo; a.out
main:
     16, Outer loop unrolled 3 times (completely unrolled)
     19, Loop unrolled 1 times (completely unrolled)
         Loop unrolled 2 times (completely unrolled)
         Loop unrolled 3 times (completely unrolled)
     35, Loop not vectorized/parallelized: contains call
 e_before_sort=   2.0000000000000  3.0000000000000  1.0000000000000  2.0000000000000
 e_after_sort=   1.0000000000000  2.0000000000000  2.0000000000000  3.0000000000000

Hi Donnie,

FYI, TPR #35668 has been fixed on the 24.7 release.

% nvfortran -V24.7 -fast test.f90 ; a.out                                                                                         
 e_before_sort=   2.0000000000000  3.0000000000000  1.0000000000000  2.0000000000000
 e_after_sort=   1.0000000000000  2.0000000000000  2.0000000000000  3.0000000000000

-Mat