Memory errors associated with MAXVAL (fortran)

Hi,

I’m trying to compile an f90 code, which defines the dimensions of arrays using MAXVAL. I cannot copy the code itself as it is copyrighted. I tried to replicate the problematic part of the original code as shown below, however the replicated code works. I know the problem is with MAXVAL since if I replace the MAXVAL function with its integer value, the original code works fine. The original code works fine with Intel Fortran Compiler, but not with HPC. When I run the executable with valgrind, I get these error at the line where the replicated TRY subroutine is called in the original code:
==15593== Use of uninitialised value of size 8
==15593== Invalid read of size 4
==15593== Process terminating with default action of signal 11 (SIGSEGV)
==15593== Access not within mapped region at address 0x26FB8038

PROGRAM MAIN
IMPLICIT NONE

INTEGER A(3)
INTEGER II(0:0)
INTEGER, ALLOCATABLE :: B(:,:)

A=(/0,0,2/)
II(1)=1

ALLOCATE(B(1,0:MAXVAL(A)))

WRITE(*,*) SIZE(B) 
WRITE(*,*) B
CALL TRY(B,A)

END 


SUBROUTINE TRY(B,A)
IMPLICIT NONE

INTEGER A(3)
INTEGER B(1,0:MAXVAL(A))
INTEGER, ALLOCATABLE :: C(:,:)

B(1,0)=5
ALLOCATE(C(1,0:MAXVAL(B)))

WRITE(*,*) B

END

I’m sure I am missing something that there must be a difference between my replicated code and the original code, but can anyone guess what might be the problem?

Thanks in advance.

1 Like

Hi kkb,

I’m not able to recreate the Valgrind error, but you are writing to II beyond it’s bounds, so that’s one possible reason.

My guess is that since you can’t recreate it with this simple code, the problem isn’t with MAXVAL itself but something else in the original code. While it could be a problem with nvfortran, in which case we would need a reproducing example to investigate, it’s also possible that the problem is with the source itself and it’s just luck that works with Intel. For example, seg faults occur when writing off the edge of a page, so if the code is writing off the end of an array, it may or may not seg fault depending on the data layout which can be different between the compilers.

-Mat

There are a couple of things in your code:

  1. You are accessing II(1) that is out of bound
  2. You allocate B but never initialize it before calling try. Inside try, you are using maxval of B after setting only one element.

If you initialize B to zero before calling try and fix the II access, valgrind report will be clean

Thanks for your answers. I have managed to reproduce an example.

PROGRAM MAIN
IMPLICIT NONE

INTEGER, PARAMETER :: 	D1 = 1   	
INTEGER, PARAMETER ::	D2 = 2	
INTEGER, PARAMETER ::	D3 = 3  	
INTEGER DIM1, DIM2

DOUBLE PRECISION, ALLOCATABLE :: PC(:,:,:)
DOUBLE PRECISION, ALLOCATABLE :: PK(:,:,:,:)

INTEGER PDK(D1,D2)
INTEGER PDC(D1,D2,D3)

PDK(:,:) = 5
PDC(:,:,:) = 2

ALLOCATE( PC(D1,D2,   0:MAXVAL(PDC(:,:,1)) ))
ALLOCATE( PK(D1,D2,D3,0:MAXVAL(PDK(:,:))) )

DIM1=MAXVAL(PDC(:,:,1))
DIM2=MAXVAL(PDK(:,:))

WRITE(*,*) 'HELLO THERE_1'

PC(:,:,:) = 1.0D0
PK(:,:,:,:) = 1.0D0

WRITE(*,*) PK

CALL MYFUNC(D1,D2,D3,PDC,PDK,PC,PK,DIM1,DIM2)
END

SUBROUTINE MYFUNC(D1,D2,D3,PDC,PDK,PC,PK,DIM1,DIM2)
IMPLICIT NONE

INTEGER D1
INTEGER D2
INTEGER D3
INTEGER DIM1, DIM2

INTEGER PDK(D1,D2)
INTEGER PDC(D1,D2,D3)


DOUBLE PRECISION PC(D1,D2,   0:MAXVAL(PDC(:,:,1)))
DOUBLE PRECISION PK(D1,D2,D3,0:MAXVAL(PDK(:,:)))

WRITE(*,*) 'HELLO THERE_2'
END

This code causes a segmentation fault for me. If I change definitions of PC and PK in MYFUNC by replacing the MAXVAL functions with their integer values (DIM1 and DIM2), the code works fine.

For good or bad, I’m still not seeing any errors.

% nvfortran test.f90 -V20.7 -fast
% valgrind a.out
==19536== Memcheck, a memory error detector
==19536== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==19536== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==19536== Command: a.out
==19536==
 HELLO THERE_1
    1.000000000000000         1.000000000000000         1.000000000000000
    1.000000000000000         1.000000000000000         1.000000000000000
    1.000000000000000         1.000000000000000         1.000000000000000
    1.000000000000000         1.000000000000000         1.000000000000000
    1.000000000000000         1.000000000000000         1.000000000000000
    1.000000000000000         1.000000000000000         1.000000000000000
    1.000000000000000         1.000000000000000         1.000000000000000
    1.000000000000000         1.000000000000000         1.000000000000000
    1.000000000000000         1.000000000000000         1.000000000000000
    1.000000000000000         1.000000000000000         1.000000000000000
    1.000000000000000         1.000000000000000         1.000000000000000
    1.000000000000000         1.000000000000000         1.000000000000000
 HELLO THERE_2
==19536==
==19536== HEAP SUMMARY:
==19536==     in use at exit: 12,688 bytes in 3 blocks
==19536==   total heap usage: 6 allocs, 3 frees, 30,104 bytes allocated
==19536==
==19536== LEAK SUMMARY:
==19536==    definitely lost: 400 bytes in 2 blocks
==19536==    indirectly lost: 0 bytes in 0 blocks
==19536==      possibly lost: 0 bytes in 0 blocks
==19536==    still reachable: 12,288 bytes in 1 blocks
==19536==         suppressed: 0 bytes in 0 blocks
==19536== Rerun with --leak-check=full to see details of leaked memory
==19536==
==19536== For lists of detected and suppressed errors, rerun with: -s
==19536== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Thanks for your answer. When I add the flag -fast, I got no errors as well. Did you try to run it with no flags?

I tried individual -fast options, -O2 and -Munroll=c:1 get the code working, -Mnoframe, -Mlre and -Mpre do not.

Interesting. Yes, I was only using “-g” or “-fast”, the error seems to only occur when using low optimization but not with debugging enable. I’m now able to replicate the issue so have added a problem report, TPR #28969, and sent it on to engineering.

Thanks,
Mat

Thank you. As a side note, I have also noticed that even though no errors occur when using -fast or -g, sizes of arrays PC and PK come out as 0 in subroutine MYFUNC.