Loop not vectorized: mixed data types

I’m trying to optimize a Fortran 90 code, from which I’m trying to squeeze last drops of performance :-).

Compiling the code using
pgf90 -fastsse -Minline=levels:2 -Minfo=all …
there is the message in the log:

924, Loop not vectorized: mixed data types
926, greenfun inlined, size=3, file mf__jan15_bilayer.f (2318)
2323, greenfun_2 inlined, size=23, file mf__jan15_bilayer.f (2332)

The line numbers point to the following part of the code (I restore the line numbers by hand):

924      do j=1,pm ; vova = nm_clmn(j)
925         sv = ksite(vova); tv = ktau(vova)
926	       m_v2(j,2) = GREENFUN(sv,tv,site,tnew)
927      enddo

Here nm_clmn(:) and ksite(:) is a (rather large) integer arrays, and ktau(:) and m_v2(:,1:2) are real*8 arrays. GREENFUN is a routine which gets inlined, as far as I understand, it’s basically a spline interpolation of a pre-tabulated array.

The question is then — what exactly prevents the compiler from vectorizing the loop, and if there is a way to fix it?

Any suggestions would be gratefully appreciated.

Zhenya

Hi Zhenya,

924, Loop not vectorized: mixed data types

This means that the data types on the left and right hand side are different, hence preventing vectorization.

What data type does GREENFUN return? I’m assuming real4, in which, this needs to be changed to real8, or m_v2 needs to be real*4.

Hope this helps,
Mat

Hi Mat,

As a rule of thumb I try to avoid mixing real4 and real8 by just using real*8 always.

Double-checked the code once again, and no, there are only integer-s (no kind specified) and real8-s. And all the assignments in this code block are either integer-to-integer, or real8-to-real*8.

That’s a little puzzling, at least for me.

Zhenya

Hi Zhenya,

I’m not sure then. Can you post are reproducing example and/or the source from GREENFUN?

Thanks,
Mat

Hi Mat,

Here’s the code which shows this glitch. The code is a copy-paste of relevant parts of the real code, and this snipped is not supposed to run – the allocatable arrays are not allocated, variables declared but not initialized etc.

I’m compiling it with

pgf90 -Mfree -fastsse -Minfo=all -Minline=level:2 loopnotectorized.f -lacml

and the compilation log says:

MAIN:
     31, Loop not vectorized: mixed data types
     33, greenfun inlined, size=3, file loopnotectorized.f (42)
          47, greenfun_2 inlined, size=23, file loopnotectorized.f (56)
greenfun:
     47, greenfun_2 inlined, size=23, file loopnotectorized.f (56)

The loop in question is marked with a comment !@#$%

      implicit none
! globals
	real*8, allocatable :: m_v2(:,:)
     	integer             :: pm,lda      ! actual size & leading dimension 
      real*8, allocatable :: GR_DAT_2(:,:,:), GRD_DAT_2(:,:,:)   ! cf ine TABU_2
	real*8 :: beta       ! inverse temperature
	integer :: ntab                   ! Actual Number of sites per dimension for tabulation
	integer :: mtau
	real*8 :: bmt, bmt1     ! a shorthand for beta/mtau, and its inverse

! from teh lattice module
     integer  :: Nsite, Ncell          ! # of sites, # of unit cells
	integer, allocatable   :: ksite(:)      ! ksite(name) => site of a kink 'name'
	real*8, allocatable    :: ktau(:)       ! ktau(name) => tau of a kink 'name'
	integer, allocatable   :: nm_row(:),nm_clmn(:)  ! nm_row(row) => name of the kink associated with the row

! from add_2_same
	integer :: site,j,nk,vova,sv 
	real*8  :: tnew, tv, tnew2

!----------------
	lda=128; 
	allocate(m_v2(lda,2)) 

      
 !@#$% -----------
      do j=1,pm ; vova = nm_clmn(j)
         sv = ksite(vova); tv = ktau(vova)
	   m_v2(j,1) = GREENFUN(sv,tv,site,tnew2)
      enddo


      contains


!----------------------------------------------------
! Green Function, selector
!----------------------------------------------------
      real*8 function GREENFUN(site1,tau1,site2,tau2)
      implicit none
      integer, intent(in) :: site1, site2
      real*8, intent(in)  :: tau1, tau2

      GREENFUN = GREENFUN_2(site1,tau1,site2,tau2)
!      GREENFUN = GREENFUN_1(site1,tau1,site2,tau2)      

      end function GREENFUN


!----------------------------------------------
!---  Green Function, spline interpolation of GR_DAT_2 
!----------------------------------------------
      real*8 function GREENFUN_2(site1,tau1,site2,tau2)
      implicit none
      integer :: site1,site2,j, sgn
      real*8 :: tau, tau1, tau2, dt, gre

      integer :: nx, ny, nz, nta  !, ntb
      real*8 :: tta,ttb,ga,gb,c, gra,grb   !,p

! prepare \tau
      tau=tau1-tau2
      dt=tau; sgn=1

	if(tau < 1.d-14)then; dt=beta+tau; sgn=-1; endif
! Explanation: G(t=0) must be understood as G(t-> -0) = -G(t=\beta)
! A long way to accomplish this is below, commented out. A short way is above :).
!----------------------------------------

!----------------------------------- spline
	nta=dt*bmt1 !*p

      tta=dt-nta*bmt 
	ttb=tta - bmt     !dt-ntb*(beta/mtau) 
!cccccccccccccccccccccccccccccccccccccc
      
	ga=GR_DAT_2(nta,site1,site2)
	gb=GR_DAT_2(nta+1,site1,site2)

	gra=GRD_DAT_2(nta,site1,site2)
	grb=GRD_DAT_2(nta+1,site1,site2)

      c=(ga-gb)*bmt1

      gre=(c+gra)*ttb + (c+grb)*tta
      gre=gre*tta*ttb*bmt1 + gb*tta-ga*ttb
      gre=gre*bmt1


	GREENFUN_2 = gre*sgn


      end function GREENFUN_2




!-------------------------------------
!     Tabulates Green function and its time derivate at positive tau : ALL-NUMERICAL, 
!        taken from ../disord/fermi-hubbard-disord.f
!-------------------------------------
      subroutine TABU_2
      implicit none
      real*8, allocatable :: ham(:,:)
      integer :: site,site1,j
	real*8 :: factor, ww,ttt,term, gamma, expet(0:mtau)
	integer :: nt

	! lapack stuff
	character*1 :: jobz,uplo
	integer     :: ldh, lwork,info
	real*8, allocatable  :: work(:), eps(:)

	integer :: site2, i_x1(3), i_x2(3), n1,n2

	print*,' TABU_2'


      if(allocated(GR_DAT_2)) deallocate(GR_DAT_2)
      if(allocated(GRD_DAT_2)) deallocate(GRD_DAT_2)

	allocate( GR_DAT_2(0:mtau+1,1:Nsite,1:Nsite), GRD_DAT_2(0:mtau+1,1:Nsite,1:Nsite) )


! build the hamiltonian
	allocate(ham(1:Nsite,1:Nsite)) ; 
	ham=0.d0
!------------------------------- commented out for the loopnotectorized ONLY
!	do site=1,Nsite
!	   do j=1,coord_nbr(site); site1=neighb(j,site); 
!	      ham(site,site1)=ham(site,site1)-hop_int(j,site)  
!	   enddo
!         if(site_layer(site)==1)then; ham(site,site) = -Vlayer
!         else;                        ham(site,site) =  Vlayer
!         endif
!	enddo;	

!  compute eigenvalues; for LAPACK parameters and arguments, see
!  http://www.netlib.org/lapack/double/dsyev.f
!  SUBROUTINE DSYEV( JOBZ, UPLO, N, A, LDA, W, WORK, LWORK, INFO )

	jobz='V'  ! compute eigenvectorz
	uplo='U'  ! use upper diag of ham(:,:) --- doesn't matter really
	ldh=Nsite
	lwork=12*Nsite
	allocate( work(lwork), eps(Nsite) )

! query the optimal workspace size
	call dsyev(jobz,uplo,Nsite,ham,ldh,eps,work,-1,info)
	lwork=work(1)
	deallocate(work); allocate(work(lwork))

! diagonalize
	call dsyev(jobz,uplo,Nsite,ham,ldh,eps,work,lwork,info)

	if(info/=0)then; print*,'*** dsyev returns info = ', info
			 print*,'*** check the TABULATE routine'
			 call mystop
	endif

!------------- have the spectrum, proceed to the GFs
	GR_DAT_2=0.d0; GRD_DAT_2=0.d0	

	do j=1,Nsite

	  gamma=-eps(j)*beta
	  gamma=exp(gamma)+1.d0
	  gamma=-1.d0/gamma

	  ww = exp(-eps(j)*bmt) ! bmt=beta/mtau
	  do nt=0,mtau; expet(nt)=ww**nt
	  enddo

          do site=1,Nsite ; do site1=1,Nsite
	    factor = ham(site,j)*ham(site1,j)
            do nt=0,mtau
               term = factor*expet(nt)*gamma    !/Nsite
               GR_DAT_2(nt,site,site1) = GR_DAT_2(nt,site,site1) + term
               GRD_DAT_2(nt,site,site1) = GRD_DAT_2(nt,site,site1) -eps(j)*term
            enddo

          enddo; enddo ! site, site1

	enddo   ! j: eigenvalues
!------------------------


! fill in fictitious nt=mtau+1, see GREENFUN for explanation
	GR_DAT_2(mtau+1,:,:)=0.d0; GRD_DAT_2(mtau+1,:,:)=0.d0

      end subroutine TABU_2
!++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

	subroutine mystop
!!! free the memory etc
	stop
	end subroutine mystop

      end

Do you spot anything suspicious?

Thanks,
Zhenya

Hi Zhenya,

Sorry I should have caught this earlier. The data size for your integer and reals must be the same as well. In other words, you need to promote integer to integer*8. (Note you can use the command line flag “-i8” to set the default kind to 8 for integers).

  • Mat
% pgf90 -fast -Minfo -c test2.f90 -Minline -V11.1 
MAIN:
     28, Loop not vectorized: mixed data types
     30, greenfun inlined, size=3, file test2.f90 (40)
          45, greenfun_2 inlined, size=21, file test2.f90 (54)
greenfun:
     45, greenfun_2 inlined, size=21, file test2.f90 (54)
tabu_2:
    132, Memory zero idiom, array assignment replaced by call to pgf90_mzero8
    163, mystop inlined, size=2, file test2.f90 (199)
    167, Memory zero idiom, array assignment replaced by call to pgf90_mzero8
    169, Loop not vectorized/parallelized: too deeply nested
    176, Loop not vectorized: mixed data types
    181, Generated 3 alternate versions of the loop
         Generated vector sse code for the loop
         Generated 3 prefetch instructions for the loop
    194, Loop unrolled 8 times

% pgf90 -fast -Minfo -c test2.f90 -Minline -V11.1 -i8 -Mvect=levels:5
MAIN:
     30, greenfun inlined, size=3, file test2.f90 (40)
          45, greenfun_2 inlined, size=22, file test2.f90 (54)
greenfun:
     45, greenfun_2 inlined, size=22, file test2.f90 (54)
tabu_2:
    132, Memory zero idiom, array assignment replaced by call to pgf90_mzero8
    163, mystop inlined, size=2, file test2.f90 (199)
    167, Memory zero idiom, array assignment replaced by call to pgf90_mzero8
    181, Generated 3 alternate versions of the loop
         Generated vector sse code for the loop
         Generated 3 prefetch instructions for the loop
    194, Loop unrolled 8 times

Wow, I would never have guessed that. Thanks a whole lot, Mat!