-O3 leads to error

hello!
I compile some f90 files (using lapack) with PGI Community Edition Version 17.4 with windows 10. With check:uninit, there is no any error or warning. But with O3 optimization, it leads different result from without O3. And the result show it’s right (as same as IVF’s result) without option of optimization. But without optimization it’s much more slow than the IVF (about 4 times)
Could you please tell me why or what should I do?
Thank you!

Hi bnliang,

If you can, please send a reproducing example to PGI Customer Service (trs@pgroup.com) and we’ll take a look.

Questions for you:

  1. How wrong are the answers? Very different or just slight off?
  2. Do you get correct answers with -O2?

-Mat

hi mkcolg,
I have send a message to the mailbox.

hi mkcolg,
I have found where the wrong happens. When I call DGEMM, matrix multiply matrix in the lapack, the results will be zero with O3 or O2 while it’s correct without O3.
Could plese tell me what happens?
Thank you!

Hi bnliang,

PGI Customer Service hasn’t seen any messages from you. Can you try sending it again?

Thanks,
Mat

hi mkcolg,
I send the message again from lhwllw@stu.xjtu.edu.cn. remember check!
Thanks
bnliang

We got you email, but the pictures you sent didn’t come through. Also, what we need is a reproducing example (source, data sets, etc) so we can investigate the error.

-Mat

hi mkcolg,
For the source is in the local area network, I’m sorry I can’t send it to you. The following code is where I found the error.

DO K=1,NumNeededBlocks
   M = RankSurf(K)
   N = RankVol(K)
   IF (M*N .EQ. 0) CYCLE
   CALL DGEMM('N','N',N,M,N,1.0D0,&
 ,ASub%Matrix(K)%SubM,N,DSub%Matrix(K)%SubM,N,0.0D0,MSub%Matrix(K)%SubM,N)
END DO

With O2 or O3, all elements in MSub%Matrix(K)%SubM will be zero while it’s normal without O2 or O3. And the following is the command of create the program.

pgf90 -c -O2  E:/SRC/PRORAM_NodalMethods.f90     
pgf90 -c -O2  E:/SRC/MODULE_NECPconstants.f90    
pgf90 -c -O2  E:/SRC/MOUDLE_RM_Store.f90         
pgf90 -c -O2  E:/SRC/MODULE_Array.f90            
pgf90 -c -O2  E:/SRC/MODULE_MatrixCalculation.f90
pgf90 -c -O2  E:/SRC/MODULE_Control.f90          
pgf90 -c -O2  E:/SRC/MODULE_NoneZero.f90         
pgf90 -c -O2  E:/SRC/MODULE_Geometry_Nodal.f90   
pgf90 -c -O2  E:/SRC/MODULE_BasisFS1D.f90        
pgf90 -c -O2  E:/SRC/MODULE_Material.f90         
pgf90 -c -O2  E:/SRC/MODULE_Configuration.f90    
pgf90 -c -O2  E:/SRC/MODULE_IterationOptimize.f90
pgf90 -c -O2  E:/SRC/MODULE_CMFD.f90             
pgf90 -c -O2  E:/SRC/MODULE_BasisFSMD.f90        
pgf90 -c -O2  E:/SRC/MODULE_VNMrm.f90            
pgf90 -c -O2  E:/SRC/MODULE_VNM.f90              
pgf90 -c -O2  E:/SRC/MODULE_Export.f90           
pgf90 -c -O2  E:/SRC/MODULE_INPUT.f90            
pgf90 -c -O2  E:/SRC/SUBROUTINE_VNM_Driver.f90   
pgf90 -o violet *.obj  -llapack -lblas

hi,mkcolg
I’m sorry to disturb you again. Recently, I delete all DGEMM in my code. But it still wrong. Then I found the more complex problem with O2 or O3 employed. The following code is where I found the problem exists.

By this code, the file of “1234” is all zero

      User_VNMrm%Dmatrix = 0.0D0
      IEnd = 0
      DO I=1,NumSurfaces
         IStart = IEnd+1
         IEnd = IEnd+NumMoments_Surface
         User_VNMrm%Dmatrix(:,IStart:IEnd) = User_VNMrm%Dmatrix(:,IStart:IEnd)+Dmatrix(:,:,I)
      END DO
      WRITE(1234,*)"SDSDFSDFSDFSDF"
      WRITE(1234,*)User_VNMrm%Dmatrix
      STOP

By this code, the file of “1234” isn’t all zero and is as same as the file without O2 or O3

      User_VNMrm%Dmatrix = 0.0D0
      IEnd = 0
      DO I=1,NumSurfaces
         IStart = IEnd+1
         IEnd = IEnd+NumMoments_Surface
         User_VNMrm%Dmatrix(:,IStart:IEnd) = User_VNMrm%Dmatrix(:,IStart:IEnd)+Dmatrix(:,:,I)
         WRITE(1234,*)IStart,IEnd
         WRITE(1234,*)User_VNMrm%Dmatrix(:,IStart:IEnd)
      END DO
      WRITE(1234,*)"SDSDFSDFSDFSDF"
      WRITE(1234,*)User_VNMrm%Dmatrix
      STOP

There is no any change of User_VNMrm%Dmatrix, while it has a large change with printing it out

Hi bnliang,

It could be a problem with register allocation of the “IStart” and “IEnd” variables or a problem with the array syntax code generation. Though the code you’re using is fairly common so the problem is something specific to your code.

Unfortunately, we really do need a reproducing example to determine what’s going on. Can you extract a small example from the larger code?

Sans an example, you can try making the array syntax explicit loops to see if that works around the problem. If it still fails, then the problem is more likely the IStart and IEnd variables.

-Mat

thank you. With abandoning the array syntax like A(i:j,k:l)=B(:,:), O2 and lapack become normal.
But I still feel so strange. just like the code last quote, with the print of each block, print full matrix is normal. While the full matrix are all zero without printing each block and any other different.