VASP segfaults in SET_DD_PAW with pgf90 >= 7.1-1

Compiling VASP (4.6.26, 4.6.28 and 4.6.34 tested) with recent versions of PGF90 (7.1-1 to 7.2-2) causes crashes in subroutine SET_DD_PAW from paw.F, at the statement

COCC=0

(at line 1468 in VASP 4.6.34)

32-bit builds crash with e.g.

0: ALLOCATE: 3665294080 bytes requested; not enough memory

and 64-bit builds simply segfault.

This occurs because the array is auto-allocated (at line 1334)

OVERLAP COCC(LMDIM,LMDIM,MAX(2,WDES%NCDIJ))

and it’s not the use of MAX that is the issue. (OVERLAP here is a macro, which will turn into REAL(q) in the code, or in some cases COMPLEX(q).)

The code runs fine if I explicitly allocate the array - see patch below - and a short test programme doing exactly the same thing that VASP is doing here runs fine, as it should do.

This code runs properly with older versions of pgf90 (up to 7.0-7) and with other compilers (Sun, DEC, Intel). Tests were conducted with PGI Workstation 7.2-1 on RHEL4.6 WS x86_64 on rev. F Opterons (VASP built both 32- and 64-bit), and PGI Workstation 6.2-5, 7.0-7, 7.1-1, 7.1-6. 7.2-1 and 7.2-2 on RHEL5.2 Workstation i386 on Prestonia-core Xeons.

Patch to VASP to work around the bug:

--- paw.F.orig  2008-06-12 12:33:34.672781700 +0100
+++ paw.F       2008-06-12 13:33:04.952793400 +0100
@@ -1334 +1334 @@
-      OVERLAP COCC(LMDIM,LMDIM,MAX(2,WDES%NCDIJ)),COCC_IM(LMDIM,LMDIM)
+      OVERLAP , ALLOCATABLE :: COCC(:,:,:),COCC_IM(:,:)
@@ -1407,0 +1408,2 @@
+      ALLOCATE (COCC( LMDIM, LMDIM, MAX(2,NCDIJ) ), COCC_IM( LMDIM, LMDIM ))
+
@@ -1890,0 +1893 @@
+      DEALLOCATE(COCC,COCC_IM)

It seems to me that something introduced in pgf90 7.1 might have broken automatic allocation under some circumstances.

Hi,

If the array is larger than 2GB, you will need to compile with -mcmodel=medium, and also unlimit stacksize before you run a binary, Redhat is known to fail for limited stacksize.

Hongyon

The array COCC is small, (1,1,15) if I recall correctly.
*edit: it’s (15,15,2), LMDIM is 15 and WDES%NCDIJ is 1) *

The issue is that pgf90 >= 7.1 cannot cope with the declaration statement

REAL(q) COCC(LMDIM,LMDIM,MAX(2,WDES%NCDIJ)),COCC_IM(LMDIM,LMDIM)

but pgf90 up to 7.0-7 can, as can other compilers. I’ve done a lot of testing, some with pgdbg, and with this line in the source the execution goes wrong on entry to the subroutine but if I change this statement as shown, the code runs. A bug is being exposed here.

Also, stacksize is unlimited.

Hi,

Can you please send us a small test case to trs@pgroup.com?


Thank you,
Hongyon

Impossible, as a short test programme doing exactly the same thing that VASP is doing here runs fine, as it should do.

Do the engineers have access to VASP ?

Hi,

Unfortunately we don’t have access to VASP?

Hongyon