Memory errors associated with MAXVAL (fortran)

kkb · September 3, 2020, 9:30am

Hi,

I’m trying to compile an f90 code, which defines the dimensions of arrays using MAXVAL. I cannot copy the code itself as it is copyrighted. I tried to replicate the problematic part of the original code as shown below, however the replicated code works. I know the problem is with MAXVAL since if I replace the MAXVAL function with its integer value, the original code works fine. The original code works fine with Intel Fortran Compiler, but not with HPC. When I run the executable with valgrind, I get these error at the line where the replicated TRY subroutine is called in the original code:
==15593== Use of uninitialised value of size 8
==15593== Invalid read of size 4
==15593== Process terminating with default action of signal 11 (SIGSEGV)
==15593== Access not within mapped region at address 0x26FB8038

PROGRAM MAIN
IMPLICIT NONE

INTEGER A(3)
INTEGER II(0:0)
INTEGER, ALLOCATABLE :: B(:,:)

A=(/0,0,2/)
II(1)=1

ALLOCATE(B(1,0:MAXVAL(A)))

WRITE(*,*) SIZE(B) 
WRITE(*,*) B
CALL TRY(B,A)

END 


SUBROUTINE TRY(B,A)
IMPLICIT NONE

INTEGER A(3)
INTEGER B(1,0:MAXVAL(A))
INTEGER, ALLOCATABLE :: C(:,:)

B(1,0)=5
ALLOCATE(C(1,0:MAXVAL(B)))

WRITE(*,*) B

END

I’m sure I am missing something that there must be a difference between my replicated code and the original code, but can anyone guess what might be the problem?

Thanks in advance.

MatColgrove · September 3, 2020, 2:49pm

Hi kkb,

I’m not able to recreate the Valgrind error, but you are writing to II beyond it’s bounds, so that’s one possible reason.

My guess is that since you can’t recreate it with this simple code, the problem isn’t with MAXVAL itself but something else in the original code. While it could be a problem with nvfortran, in which case we would need a reproducing example to investigate, it’s also possible that the problem is with the source itself and it’s just luck that works with Intel. For example, seg faults occur when writing off the edge of a page, so if the code is writing off the end of an array, it may or may not seg fault depending on the data layout which can be different between the compilers.

-Mat

mfatica · September 3, 2020, 4:32pm

There are a couple of things in your code:

You are accessing II(1) that is out of bound
You allocate B but never initialize it before calling try. Inside try, you are using maxval of B after setting only one element.

If you initialize B to zero before calling try and fix the II access, valgrind report will be clean

kkb · September 6, 2020, 3:05pm

Thanks for your answers. I have managed to reproduce an example.

PROGRAM MAIN
IMPLICIT NONE

INTEGER, PARAMETER :: 	D1 = 1   	
INTEGER, PARAMETER ::	D2 = 2	
INTEGER, PARAMETER ::	D3 = 3  	
INTEGER DIM1, DIM2

DOUBLE PRECISION, ALLOCATABLE :: PC(:,:,:)
DOUBLE PRECISION, ALLOCATABLE :: PK(:,:,:,:)

INTEGER PDK(D1,D2)
INTEGER PDC(D1,D2,D3)

PDK(:,:) = 5
PDC(:,:,:) = 2

ALLOCATE( PC(D1,D2,   0:MAXVAL(PDC(:,:,1)) ))
ALLOCATE( PK(D1,D2,D3,0:MAXVAL(PDK(:,:))) )

DIM1=MAXVAL(PDC(:,:,1))
DIM2=MAXVAL(PDK(:,:))

WRITE(*,*) 'HELLO THERE_1'

PC(:,:,:) = 1.0D0
PK(:,:,:,:) = 1.0D0

WRITE(*,*) PK

CALL MYFUNC(D1,D2,D3,PDC,PDK,PC,PK,DIM1,DIM2)
END

SUBROUTINE MYFUNC(D1,D2,D3,PDC,PDK,PC,PK,DIM1,DIM2)
IMPLICIT NONE

INTEGER D1
INTEGER D2
INTEGER D3
INTEGER DIM1, DIM2

INTEGER PDK(D1,D2)
INTEGER PDC(D1,D2,D3)


DOUBLE PRECISION PC(D1,D2,   0:MAXVAL(PDC(:,:,1)))
DOUBLE PRECISION PK(D1,D2,D3,0:MAXVAL(PDK(:,:)))

WRITE(*,*) 'HELLO THERE_2'
END

This code causes a segmentation fault for me. If I change definitions of PC and PK in MYFUNC by replacing the MAXVAL functions with their integer values (DIM1 and DIM2), the code works fine.

MatColgrove · September 10, 2020, 4:30pm

For good or bad, I’m still not seeing any errors.

% nvfortran test.f90 -V20.7 -fast
% valgrind a.out
==19536== Memcheck, a memory error detector
==19536== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==19536== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==19536== Command: a.out
==19536==
 HELLO THERE_1
    1.000000000000000         1.000000000000000         1.000000000000000
    1.000000000000000         1.000000000000000         1.000000000000000
    1.000000000000000         1.000000000000000         1.000000000000000
    1.000000000000000         1.000000000000000         1.000000000000000
    1.000000000000000         1.000000000000000         1.000000000000000
    1.000000000000000         1.000000000000000         1.000000000000000
    1.000000000000000         1.000000000000000         1.000000000000000
    1.000000000000000         1.000000000000000         1.000000000000000
    1.000000000000000         1.000000000000000         1.000000000000000
    1.000000000000000         1.000000000000000         1.000000000000000
    1.000000000000000         1.000000000000000         1.000000000000000
    1.000000000000000         1.000000000000000         1.000000000000000
 HELLO THERE_2
==19536==
==19536== HEAP SUMMARY:
==19536==     in use at exit: 12,688 bytes in 3 blocks
==19536==   total heap usage: 6 allocs, 3 frees, 30,104 bytes allocated
==19536==
==19536== LEAK SUMMARY:
==19536==    definitely lost: 400 bytes in 2 blocks
==19536==    indirectly lost: 0 bytes in 0 blocks
==19536==      possibly lost: 0 bytes in 0 blocks
==19536==    still reachable: 12,288 bytes in 1 blocks
==19536==         suppressed: 0 bytes in 0 blocks
==19536== Rerun with --leak-check=full to see details of leaked memory
==19536==
==19536== For lists of detected and suppressed errors, rerun with: -s
==19536== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

kkb · September 10, 2020, 4:36pm

Thanks for your answer. When I add the flag -fast, I got no errors as well. Did you try to run it with no flags?

I tried individual -fast options, -O2 and -Munroll=c:1 get the code working, -Mnoframe, -Mlre and -Mpre do not.

MatColgrove · September 10, 2020, 10:35pm

Interesting. Yes, I was only using “-g” or “-fast”, the error seems to only occur when using low optimization but not with debugging enable. I’m now able to replicate the issue so have added a problem report, TPR #28969, and sent it on to engineering.

Thanks,
Mat

kkb · September 11, 2020, 1:00am

Thank you. As a side note, I have also noticed that even though no errors occur when using -fast or -g, sizes of arrays PC and PK come out as 0 in subroutine MYFUNC.

Topic		Replies	Views
OpenMP error: use of undefined value nvc, nvc++ and nvfortran	7	613	September 1, 2023
Runtime error with nvfortran 20.7 nvc, nvc++ and nvfortran	7	815	March 24, 2022
OpenACC: cuStreamSynchronize crash when using pointers as parameters nvc, nvc++ and nvfortran	4	821	December 7, 2021
Problem with Fortran interface cusparseSpMV calculating complex SpMV nvc, nvc++ and nvfortran cuda	5	739	November 17, 2022
pgf77 memory allocation segmentation fault error Legacy PGI Compilers	8	9837	March 2, 2007
Running HPCX-OpenMPI included in NVIDIA HPC SDK 24.1 causes unusual segfault nvc, nvc++ and nvfortran networking-ucx , openmpi , hpc-x	3	750	February 29, 2024
NVFORTRAN SEGMANTATION FAULT (CORE DUMPED) in OPENACC DATA REGION nvc, nvc++ and nvfortran cuda	5	1222	August 3, 2021
Nvfortran bug(s) with allocatable character objects nvc, nvc++ and nvfortran nvbugs	2	581	April 13, 2023
Nvfortran: Passing shared arrays of variable size to device subroutine causes memory error nvc, nvc++ and nvfortran cuda	7	82	August 28, 2024
Bug of nvfortran 24.3-0: "fort1 TERMINATED by signal 11" nvc, nvc++ and nvfortran nvbugs	8	513	September 24, 2024

Memory errors associated with MAXVAL (fortran)

Related topics