Problem with -ta=nvidia,time

I’m trying to follow the examples in one of the PGI Insider articles (http://www.pgroup.com/lit/articles/insider/v1n1a1.htm) and have run into some problems when trying to get the time profiling information. The code runs as expected but no timing information is printed to the screen. I’ve tried both pgfortran and pgcc and have the same problem in both.

My compile lines are:

pgcc -o c2.exe c2.c -ta=nvidia,time -Minfo
pgfortran -o f2.exe f2.f90 -ta=nvidia,time -Minfo

I’ve checked that kvsetta.o exists in /opt/pgi/linux86-64/10.0/lib as well.

Any advice would be much appreciated.

Thanks,

Paul

Hi Paul,

I’m not sure. Can you please post the full verbose output (i.e. add the “-v” flag) of your compilation of c2.c as well as the output of your c2.exe run? Also, please set “ACC_NOTIFY” to “1” in your environment before running c2.exe.

Thanks,
Mat

Hi Mat,

Sorry for the delay. Here is the extra information you asked for.

1027> pgcc -o c2.exe c2.c -v -ta=nvidia,time -Minfo

/opt/pgi/linux86-64/10.0/bin/pgc c2.c -opt 2 -x 119 0xa10000 -x 122 0x40 -x 123 0x1000 -x 127 4 -x 127 17 -x 19 0x400000 -x 28 0x40000 -x 120 0x10000000 -x 70 0x8000 -x 122 1 -quad -x 59 4 -x 59 4 -tp nehalem-64 -x 120 0x1000 -astype 0 -stdinc /opt/pgi/linux86-64/10.0/include:/usr/local/include:/usr/lib/gcc/x86_64-redhat-linux/4.3.2/include:/usr/lib/gcc/x86_64-redhat-linux/4.3.2/include:/usr/include -def unix -def __unix -def unix -def linux -def __linux -def linux -def __NO_MATH_INLINES -def x86_64 -def LONG_MAX=9223372036854775807L -def ‘SIZE_TYPE=unsigned long int’ -def ‘PTRDIFF_TYPE=long int’ -def __THROW= -def extension= -def amd64 -def SSE -def MMX -def SSE2 -def SSE3 -def SSSE3 -predicate ‘#machine(x86_64) #lint(off) #system(posix) #cpu(x86_64)’ -def _ACCEL=200905 -cmdline ‘+pgcc c2.c -o c2.exe -v -ta=nvidia,time -Minfo’ -x 123 0x80000000 -x 123 4 -x 119 0x20 -alwaysinline /opt/pgi/linux86-64/10.0/lib/libintrinsics.il 4 -x 120 0x200000 -x 163 0x10001 -accel nvidia -x 163 128 -x 163 0x4000 -x 0 0x1000000 -x 2 0x100000 -x 0 0x2000000 -x 161 0xcff7 -x 162 0xcff7 -asm /tmp/pgcc7TuhXcZc_2W1.s
PGC-I-0222-Redundant definition for symbol __THROW (/usr/include/sys/cdefs.h: 63)
PGC-I-0222-Redundant definition for symbol extension (/usr/include/sys/cdefs.h: 334)
executing /opt/pgi/linux86-64/10.0/bin/pgnvd /tmp/pgacc3UuhLh8KuweO.gpu -ptx /tmp/pgaccVUuhne9JCmhV.ptx -o /tmp/pgaccNUuh1LfxNXGI.bin -dp
main:
32, Generating copyin(a[0:n-1])
Generating copyout(r[0:n-1])
34, Loop is parallelizable
Accelerator kernel generated
34, #pragma acc for parallel, vector(256)
Using register for ‘a’
PGC/x86-64 Linux 10.0-0: compilation completed with informational messages

/usr/bin/as /tmp/pgcc7TuhXcZc_2W1.s -o /tmp/pgcctTuh5yRIfnk3.o

/usr/bin/ld /usr/lib64/crt1.o /usr/lib64/crti.o /opt/pgi/linux86-64/10.0/lib/trace_init.o /usr/lib/gcc/x86_64-redhat-linux/4.3.2/crtbegin.o -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 /opt/pgi/linux86-64/10.0/lib/pgi.ld -L/opt/pgi/linux86-64/10.0/lib -L/usr/lib64 -L/usr/lib/gcc/x86_64-redhat-linux/4.3.2 /tmp/pgcctTuh5yRIfnk3.o -rpath /opt/pgi/linux86-64/10.0/lib -rpath /opt/pgi/linux86-64/10.0/cuda/lib -o c2.exe -lacc1 -ldl -lnspgc -lpgc -lm -lgcc -lc -lgcc /usr/lib/gcc/x86_64-redhat-linux/4.3.2/crtend.o /usr/lib64/crtn.o
Unlinking /tmp/pgcc7TuhXcZc_2W1.s
Unlinking /tmp/pgcctTuh5yRIfnk3.o
paulw@colenso@Thu 11 Mar 10:14:43 /fserver/paulw/GPU/pgi/part1

And when running the executable:

1032> setenv ACC_NOTIFY 1
paulw@colenso@Thu 11 Mar 10:14:43 /fserver/paulw/GPU/pgi/part1 1033> ./c2.exe
launch kernel file=/fserver/paulw/GPU/pgi/part1/./c2.c function=main line=34 device=0 grid=391 block=256
100000 iterations completed
1550 microseconds on GPU
2887 microseconds on host
paulw@colenso@Thu 11 Mar 10:14:43 /fserver/paulw/GPU/pgi/part1

Thanks,

Paul

Hi Paul,

I believe this was a short lived problem in version 10.0-0 that was corrected in the 10.0-1 patch release. The “kvsetta.o” object is missing from the link line. To work around the issue, please manually add /opt/pgi/linux86-64/10.0/lib/kvsetta.o to your link.

Note that there were a few major issues with the 10.0 release so I would suggest to upgrade to the latest version.

Hope this helps,
Mat