Hi

I downloaded Cactus BenchADM benchmark and followed its tutorial.txt (as well as the article “Building Cactus BenchADM with PGI accelerator compilers” by Mathew Colgrove) to build and run the code. The cpu version compiles and runs correctly. The CUDA version (StaggeredLeapfrog2_acc1.F, came with the package) crashed during the run, although it complied correctly. I then tried other steps:acc2, acc3, they all gave the same behaviour.

I noticed that in the compiler message it shows

" 367, !$acc do parallel, vector(2)

371, !$acc do parallel, vector(3)" while the tutorial documents showed “vector(8)” for the same bits. I don’t know why they are different.

pgaccelinfo runs fine and the code compiles, so I guess I installed both CUDA and the compiler correctly.

I would appreciate any suggestions on what I need to do to make the run.

## My system is RedHat 5.1, kernel 2.6.18-128.el5 x86_64 SMP

PGI 9.0.4

tesla c1060

CUDA 2.3

The error messages are:

[tester@bra-tesladev1 PGI_Acc_benchADM]$ make SIZE=120 OPT="-fast -ta=nvidia,time -Minfo=accel" build_acc1 run_acc1

pgfortran -fast -ta=nvidia,time -Minfo=accel -c -o objdir/StaggeredLeapfrog2_acc1.o ./src/StaggeredLeapfrog2_acc1.F

NOTE: your trial license will expire in 12 days, 11.2 hours.

NOTE: your trial license will expire in 12 days, 11.2 hours.

bench_staggeredleapfrog2:

366, Generating copyout(adm_kzz_stag(2:nx-2+1,2:ny-2+1,2:nz-2+1))

Generating copyout(adm_kyz_stag(2:nx-2+1,2:ny-2+1,2:nz-2+1))

Generating copyin(lalp(1:nx-2+2,1:ny-2+2,1:nz-2+2))

Generating copyout(adm_kyy_stag(2:nx-2+1,2:ny-2+1,2:nz-2+1))

Generating copyout(adm_kxz_stag(2:nx-2+1,2:ny-2+1,2:nz-2+1))

Generating copyout(adm_kxy_stag(2:nx-2+1,2:ny-2+1,2:nz-2+1))

Generating copyout(adm_kxx_stag(2:nx-2+1,2:ny-2+1,2:nz-2+1))

Generating copyin(lgzz(1:nx-2+2,1:ny-2+2,1:nz-2+2))

Generating copyin(lgyz(1:nx-2+2,1:ny-2+2,1:nz-2+2))

Generating copyin(lgyy(1:nx-2+2,1:ny-2+2,1:nz-2+2))

Generating copyin(lgxz(1:nx-2+2,1:ny-2+2,1:nz-2+2))

Generating copyin(lgxy(1:nx-2+2,1:ny-2+2,1:nz-2+2))

Generating copyin(lgxx(1:nx-2+2,1:ny-2+2,1:nz-2+2))

Generating copyin(adm_kzz_stag_p_p(2:nx-2+1,2:ny-2+1,2:nz-2+1))

Generating copyin(adm_kzz_stag_p(2:nx-2+1,2:ny-2+1,2:nz-2+1))

Generating copyin(adm_kyz_stag_p_p(2:nx-2+1,2:ny-2+1,2:nz-2+1))

Generating copyin(adm_kyz_stag_p(2:nx-2+1,2:ny-2+1,2:nz-2+1))

Generating copyin(adm_kyy_stag_p_p(2:nx-2+1,2:ny-2+1,2:nz-2+1))

Generating copyin(adm_kyy_stag_p(2:nx-2+1,2:ny-2+1,2:nz-2+1))

Generating copyin(adm_kxz_stag_p_p(2:nx-2+1,2:ny-2+1,2:nz-2+1))

Generating copyin(adm_kxz_stag_p(2:nx-2+1,2:ny-2+1,2:nz-2+1))

Generating copyin(adm_kxy_stag_p_p(2:nx-2+1,2:ny-2+1,2:nz-2+1))

Generating copyin(adm_kxy_stag_p(2:nx-2+1,2:ny-2+1,2:nz-2+1))

Generating copyin(adm_kxx_stag_p_p(2:nx-2+1,2:ny-2+1,2:nz-2+1))

Generating copyin(adm_kxx_stag_p(2:nx-2+1,2:ny-2+1,2:nz-2+1))

367, Loop is parallelizable

371, Loop is parallelizable

375, Loop is parallelizable

Accelerator kernel generated

367, !$acc do parallel, vector(2)

371, !$acc do parallel, vector(3)

375, !$acc do vector(16)

Using register for ‘adm_kxx_stag_p’

Using register for ‘adm_kxy_stag_p’

Using register for ‘adm_kxz_stag_p’

Using register for ‘adm_kyy_stag_p’

Using register for ‘adm_kyz_stag_p’

Using register for ‘adm_kzz_stag_p’

Non-stride-1 accesses for array ‘lgxx’

Non-stride-1 accesses for array ‘lgxy’

Cached references to size [18x5x4] block of ‘lgxz’

Cached references to size [18x5x4] block of ‘lgyy’

Cached references to size [18x5x4] block of ‘lgyz’

Cached references to size [18x5x4] block of ‘lgzz’

Cached references to size [18x5x4] block of ‘lalp’

pgfortran objdir/PreLoop.o objdir/StaggeredLeapfrog1a.o objdir/StaggeredLeapfrog1a_TS.o objdir/planewaves.o objdir/teukwaves.o /cctk_ThornBindings.o objdir/StaggeredLeapfrog2_acc1.o objdir/Cactus…

…

/InitialiseCactus_acc.o -fast -ta=nvidia,time -Minfo=accel -Mnomain -o bin/benchADM_acc1

time bin/benchADM_acc1 BenchADM_40l_120.par

10

1 0101 ************************

01 1010 10 The Cactus Code V4.0

1010 1101 011 www.cactuscode.org

1001 100101 ************************

00010101

100011 © Copyright The Authors

0100 GNU Licensed. No Warranty

0101

## Cactus version: 4.0.b11

Parameter file: BenchADM_40l_120.par

## Activating thorn Cactus…Success -> active implementation Cactus

Activation requested for

—>einstein time benchadm pugh pughreduce cartgrid3d ioutil iobasic<—

Activating thorn benchadm…Success -> active implementation benchadm

Activating thorn cartgrid3d…Success -> active implementation grid

Activating thorn einstein…Success -> active implementation einstein

Activating thorn iobasic…Success -> active implementation IOBasic

Activating thorn ioutil…Success -> active implementation IO

Activating thorn pugh…Success -> active implementation driver

Activating thorn pughreduce…Success -> active implementation reduce

Activating thorn time…Success -> active implementation time

if (recover)

Recover parameters

endif

Startup routines

BenchADM: Register slicings

CartGrid3D: Register GH Extension for GridSymmetry

CartGrid3D: Register coordinates for the Cartesian grid

PUGH: Startup routine

IOUtil: Startup routine

IOBasic: Startup routine

PUGHReduce: Startup routine.

Parameter checking routines

BenchADM: Check parameters

CartGrid3D: Check coordinates for CartGrid3D

Initialisation

CartGrid3D: Set up spatial 3D Cartesian coordinates on the GH

Einstein: Set up GF symmetries

Einstein: Initialize slicing, setup priorities for mixed slicings

PUGH: Report on PUGH set up

Time: Initialise Time variables

Time: Set timestep based on Courant condition

Einstein: Initialisation for Einstein methods

Einstein: Flat initial data

BenchADM: Setup for ADM

Einstein: Set initial lapse to one

BenchADM: Time symmetric initial data for staggered leapfrog

if (recover)

endif

if (checkpoint initial data)

endif

if (analysis)

Einstein: Compute the trace of the extrinsic curvature

Einstein: Calculate the spherical metric in r,theta(q), phi§

Einstein: Calculate the spherical ex. curvature in r, theta(q), phi§

endif

## do loop over timesteps

Rotate timelevels

iteration = iteration + 1

t = t+dt

Einstein: Identify the slicing for the next iteration

BenchADM: Evolve using Staggered Leapfrog

if (checkpoint)

endif

if (analysis)

Einstein: Compute the trace of the extrinsic curvature

Einstein: Calculate the spherical metric in r,theta(q), phi§

Einstein: Calculate the spherical ex. curvature in r, theta(q), phi§

endif

enddo

Termination routines

PUGH: Termination routine

Shutdown routines

## Driver provided by PUGH

## INFO (IOBasic): I/O Method ‘Scalar’ registered

INFO (IOBasic): Scalar: Output of scalar quantities (grid scalars, reductions) to ASCII files

INFO (IOBasic): I/O Method ‘Info’ registered

INFO (IOBasic): Info: Output of scalar quantities (grid scalars, reductions) to screen

INFO (BenchADM): Evolve using the ADM system

INFO (BenchADM): with staggered leapfrog

INFO (CartGrid3D): Grid Spacings:

INFO (CartGrid3D): dx=>8.4033613e-03 dy=>8.4033613e-03 dz=>8.4033613e-03

INFO (CartGrid3D): Computational Coordinates:

INFO (CartGrid3D): x=>[-0.500, 0.500] y=>[-0.500, 0.500] z=>[-0.500, 0.500]

INFO (CartGrid3D): Indices of Physical Coordinates:

INFO (CartGrid3D): x=>[0,119] y=>[0,119] z=>[0,119]

INFO (PUGH): Single processor evolution

INFO (PUGH): 3-dimensional grid functions

INFO (PUGH): Size: 120 120 120

INFO (Einstein): Setting flat Minkowski space in Einstein

INFO (IOBasic): Info: Output every 10 iterations

INFO (IOBasic): Info: Output requested for EINSTEIN::gxx EINSTEIN::alp

## it | | EINSTEIN::gxx | EINSTEIN::alp |

| t | minimum | maximum | minimum | maximum |

0 | 0.000 | 1.00000000 | 1.00000000 | 1.00000000 | 1.00000000 |

call to ctxSynchronize returned error 700: Launch failed

Accelerator Kernel Timing data

./src/StaggeredLeapfrog2_acc1.F

bench_staggeredleapfrog2

366: region entered 1 time

time(us): init=1

375: kernel launched 1 times

grid: [59x40] block: [16x3x2]

time(us): total=0 max=0 min=0 avg=0

acc_init.c

acc_init

1: region entered 1 time

time(us): init=51061

Command exited with non-zero status 1

1.12user 0.66system 0:01.79elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k

0inputs+0outputs (0major+183167minor)pagefaults 0swaps

make: *** [run_acc1] Error 1