I compile some f90 files (using lapack) with PGI Community Edition Version 17.4 with windows 10. With check:uninit, there is no any error or warning. But with O3 optimization, it leads different result from without O3. And the result show it’s right (as same as IVF’s result) without option of optimization. But without optimization it’s much more slow than the IVF (about 4 times)
Could you please tell me why or what should I do?
If you can, please send a reproducing example to PGI Customer Service (firstname.lastname@example.org) and we’ll take a look.
Questions for you:
- How wrong are the answers? Very different or just slight off?
- Do you get correct answers with -O2?
I have send a message to the mailbox.
I have found where the wrong happens. When I call DGEMM, matrix multiply matrix in the lapack, the results will be zero with O3 or O2 while it’s correct without O3.
Could plese tell me what happens?
PGI Customer Service hasn’t seen any messages from you. Can you try sending it again?
I send the message again from email@example.com. remember check!
We got you email, but the pictures you sent didn’t come through. Also, what we need is a reproducing example (source, data sets, etc) so we can investigate the error.
For the source is in the local area network, I’m sorry I can’t send it to you. The following code is where I found the error.
DO K=1,NumNeededBlocks M = RankSurf(K) N = RankVol(K) IF (M*N .EQ. 0) CYCLE CALL DGEMM('N','N',N,M,N,1.0D0,& ,ASub%Matrix(K)%SubM,N,DSub%Matrix(K)%SubM,N,0.0D0,MSub%Matrix(K)%SubM,N) END DO
With O2 or O3, all elements in MSub%Matrix(K)%SubM will be zero while it’s normal without O2 or O3. And the following is the command of create the program.
pgf90 -c -O2 E:/SRC/PRORAM_NodalMethods.f90 pgf90 -c -O2 E:/SRC/MODULE_NECPconstants.f90 pgf90 -c -O2 E:/SRC/MOUDLE_RM_Store.f90 pgf90 -c -O2 E:/SRC/MODULE_Array.f90 pgf90 -c -O2 E:/SRC/MODULE_MatrixCalculation.f90 pgf90 -c -O2 E:/SRC/MODULE_Control.f90 pgf90 -c -O2 E:/SRC/MODULE_NoneZero.f90 pgf90 -c -O2 E:/SRC/MODULE_Geometry_Nodal.f90 pgf90 -c -O2 E:/SRC/MODULE_BasisFS1D.f90 pgf90 -c -O2 E:/SRC/MODULE_Material.f90 pgf90 -c -O2 E:/SRC/MODULE_Configuration.f90 pgf90 -c -O2 E:/SRC/MODULE_IterationOptimize.f90 pgf90 -c -O2 E:/SRC/MODULE_CMFD.f90 pgf90 -c -O2 E:/SRC/MODULE_BasisFSMD.f90 pgf90 -c -O2 E:/SRC/MODULE_VNMrm.f90 pgf90 -c -O2 E:/SRC/MODULE_VNM.f90 pgf90 -c -O2 E:/SRC/MODULE_Export.f90 pgf90 -c -O2 E:/SRC/MODULE_INPUT.f90 pgf90 -c -O2 E:/SRC/SUBROUTINE_VNM_Driver.f90 pgf90 -o violet *.obj -llapack -lblas
I’m sorry to disturb you again. Recently, I delete all DGEMM in my code. But it still wrong. Then I found the more complex problem with O2 or O3 employed. The following code is where I found the problem exists.
By this code, the file of “1234” is all zero
User_VNMrm%Dmatrix = 0.0D0 IEnd = 0 DO I=1,NumSurfaces IStart = IEnd+1 IEnd = IEnd+NumMoments_Surface User_VNMrm%Dmatrix(:,IStart:IEnd) = User_VNMrm%Dmatrix(:,IStart:IEnd)+Dmatrix(:,:,I) END DO WRITE(1234,*)"SDSDFSDFSDFSDF" WRITE(1234,*)User_VNMrm%Dmatrix STOP
By this code, the file of “1234” isn’t all zero and is as same as the file without O2 or O3
User_VNMrm%Dmatrix = 0.0D0 IEnd = 0 DO I=1,NumSurfaces IStart = IEnd+1 IEnd = IEnd+NumMoments_Surface User_VNMrm%Dmatrix(:,IStart:IEnd) = User_VNMrm%Dmatrix(:,IStart:IEnd)+Dmatrix(:,:,I) WRITE(1234,*)IStart,IEnd WRITE(1234,*)User_VNMrm%Dmatrix(:,IStart:IEnd) END DO WRITE(1234,*)"SDSDFSDFSDFSDF" WRITE(1234,*)User_VNMrm%Dmatrix STOP
There is no any change of User_VNMrm%Dmatrix, while it has a large change with printing it out
It could be a problem with register allocation of the “IStart” and “IEnd” variables or a problem with the array syntax code generation. Though the code you’re using is fairly common so the problem is something specific to your code.
Unfortunately, we really do need a reproducing example to determine what’s going on. Can you extract a small example from the larger code?
Sans an example, you can try making the array syntax explicit loops to see if that works around the problem. If it still fails, then the problem is more likely the IStart and IEnd variables.
thank you. With abandoning the array syntax like A(i:j,k:l)=B(:,:), O2 and lapack become normal.
But I still feel so strange. just like the code last quote, with the print of each block, print full matrix is normal. While the full matrix are all zero without printing each block and any other different.