run time problems with 10.0

ink · November 23, 2009, 4:39pm

Hello,
I have a simple code which runs fine when compiled with 9.0-4 but either runs much slower (as non accelerated) or does not run at all (just hangs) if compiled with 10.0.
Any thoughts?
thanks

MatColgrove · November 23, 2009, 5:39pm

Hi ink,

A lot of changes went into 10.0, so it could be any of a number of things. You’re welcome to send the code into PGI Customer Support (trs@pgroup.com) and we can take a look.

Otherwise, I would start with the informational messages (-Minfo=accel) to see what’s changed. Perhaps the schedule selected by the compiler is no longer optimal and you need to use the “parallel” and “vector” clauses? Maybe the compiler is no longer caching a variable?

Mat

ink · November 23, 2009, 7:05pm

here is the code
10 #pragma acc region for parallel
11 for( i = 0 ; i < m; i++ ){
12 #pragma acc for parallel
13 for( k = 0; k < n; k++ ) {
14 #pragma acc for seq
15 for( j = 0; j < l; j++ ){
16 c_[k] = c[k] + a[j]*b[j][k];
17 }
18 }
19 }
20 }

which is compiled with
pgcc -ta=nvidia:cc13 -Minfo -fast -Msafeptr=all -c
mxm:
10, Generating copyin(b[0:l-1][0:n-1])
Generating copyin(a[0:m-1][0:l-1])
Generating copy(c[0:m-1][0:n-1])
11, Loop is parallelizable
Accelerator kernel generated
11, #pragma acc for parallel
13, Loop is parallelizable
15, Complex loop carried dependence of ‘c’ prevents parallelization
Loop carried dependence of ‘c’ prevents parallelization
Loop carried backward dependence of ‘c’ prevents vectorization

I think the kernel is generated. What I don’t understand is why it hangs like it can’t allocate a device or is checking a license._

MatColgrove · November 23, 2009, 8:30pm

Hi ink,

Nothing jumps out at me that would indicate why you’re seeing a hang. You might try using parallel, vector(16) for your i loop and remove the second parallel clause around the k loop. Though, these should help performance and not cause runtime errors.

Try setting “NVDEBUG=1” in your environment. This will give you a lot of information but hopefully help in determining exactly where the hang is. Note that there aren’t any runtime license checks.

Hope this helps,
Mat

ink · November 24, 2009, 11:53am

Mat, many thanks for your help.
it turned out that sitenvrc still needs to be setup manually and i forgot about it. (it is a bit strange that even small incremental updates eg from 9.0-3 to 9.0-4 could not pick it up automatically).

moving on. i’m getting now
gfec: error: unrecognized option `-TARG:abi=n64’

without sitenvrc the code can be compiled but hangs (even if sitenvrc is created after the code was compiled)

with sitenvrc i’m getting the error

MatColgrove · November 24, 2009, 4:28pm

Hi ink,

Since we’re now able to ship all needed CUDA tools and libraries with the 10.0 compilers, you no longer need to create a sitenvrc file.

My best guess as the cause of the gfec error is a mismatch in CUDA versions and that you should remove the sitenvrc file from your 10.0 installation.

Did NVDEBUG help in determining where the hang occurs?

Mat

Tuan · November 24, 2009, 8:55pm

A PGI expert confirmed me that sitenvrc doesn’t need anymore with PGI v10.0

ink · November 25, 2009, 4:50pm

well, it could be the cards on the two nodes that i used were in some bad state. i’m not quite sure. i removed sitenvrc, rebooted the nodes and also tried another node. i can compile my code with 10.0 and run it but it runs a few times slower than the 9.0-4 binary
if i set NVDEBUG=1 i see
__pgi_cu_init() found 4 devices
__pgi_cu_init() will use device 0 (V1.3)
__pgi_cu_init() compute context created
etc
so the card seems to be used