program got SIGSEGV on pgi_acc internal function call

Hi Mat,

how can I figure out the reason of this:

./aaa.exe
...
Segmentation fault (core dumped)
arom@cuda:~/aaa/test/em_real$ LM_LICENSE_FILE=/opt/pgi/license.dat pgdbg -core ./core ./aaa.exe
pgdbg-Warning-Cannot open X DISPLAY. Check your DISPLAY environment variable. Switching to command-line interface (-text).

PGDBG 13.5-0 x86 (Workstation, 8 Process)
Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved.
Copyright 2000-2013, STMicroelectronics, Inc. All Rights Reserved.
WARNING: Unexpected pnote type 512 (0x200) in core file
WARNING: Unexpected pnote type 1431193932 (0x554e494c) in core file
WARNING: Unexpected pnote type 88 (0x58) in core file
Loaded: /home/arom/aaa/test/em_real/./aaa.exe ./core
Signalled SIGSEGV at 0x097F9C10, function __pgi_uacc_cuda_enter, file ../src/cuda_enter.c, line 50

pgdbg> where

STACK TRACE:

   #6  solve_interface_ address: 0x085D3BE2
   #5  solve_em_ address: 0x086F729B
   #4  module_microphysics_driver_microphysics_driver_ address: 0x0878DAD2
   #3  module_mp_thompson_mp_gt_driver_ address: 0x08B84A5E
   #2  module_mp_thompson_mp_thompson_ address: 0x08B962DD
   #1  __pgi_uacc_enter line: "../src/enter.c"@54 address: 0x097F50E2
     filename = 0x0997DD80, funcname = 0x0997DDB4, lineno = 973, rversion = 0xBFCAA7A0, objinfo = 0x0997E4E0, devid = 1
=> #0  __pgi_uacc_cuda_enter line: "../src/cuda_enter.c"@50 address: 0x097F9C10
     rversion = 0xBFCAA7A0, objinfo = 0x0997E4E0, dindex = 1

pgdbg>

it’s happened after I said ‘acc mirror’ and ‘acc update device’ for several variables. PGI 13.5, 13.6

P.S. Are there any plans to implement ‘acc declare device resident’ directive?

Alexey

Hi Alexey,

This is the second report of this error but I haven’t seen a reproducing example yet to be able to determine the cause.

What I’d like to know is what type of device you are using, what is the driver version, and if you have been able to successfully run code on it before? Given the routine, my first guess is that there’s something wrong with the device or we’re doing something wrong in setting up the device.

Also, can you set the environment variable “PGI_ACC_DEBUG=1”, and either post or send to me the output where it fails?

Thanks,
Mat

Hi Mat,

my GPUs work correctly. I could launch other version of my program.

here is some info about my system

$ nvidia-smi
Fri Jun 14 20:13:02 2013
+------------------------------------------------------+
| NVIDIA-SMI 4.310.44   Driver Version: 310.44         |
|-------------------------------+----------------------+----------------------+
| GPU  Name                     | Bus-Id        Disp.  | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap| Memory-Usage         | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla C2070              | 0000:01:00.0     Off |                    0 |
| 30%   61C    P0    N/A /  N/A |   0%    9MB / 5375MB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla C2050              | 0000:02:00.0     Off |                    0 |
| 30%   58C    P0    N/A /  N/A |   0%    6MB / 2687MB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|  No running compute processes found                                         |
+-----------------------------------------------------------------------------+

Here is tail of log file (PGI_ACC_DEBUG=1)

pgi_uacc_cuda_dataup1(devdst=0x10b60000,hostsrc=0xffd9d20,offset=0,stride=1,size=169740,eltsize=4,lineno=929,name=qc3d,thread=0)
     upload 0xffd9d20->0x10b60000 for 678960 bytes stream (nil) thread 0
pgi_uacc_dataon( devid=1, threadid=1 )
pgi_uacc_dataon(devptr=0x0,hostptr=0xff340f0,poffset=0,offset=0,0,0,stride=1,82,2460,size=82x30x69,extent=82x30x69,eltsize=4,lineno=929,name=qv3d,flags=0xf00=create+present+copyin+copyout,threadid=1)
pgi_uacc_dataon( devid=1, threadid=1 ) dindex=1
NO map for host:0xff340f0
pgi_uacc_alloc(size=678960,devid=1,threadid=1)
pgi_uacc_alloc(size=678960,devid=1,threadid=1) returns 0x10c60000
map    dev:0x10c60000 host:0xff340f0 size:678960 offset:0  data[dev:0x10c60000 host:0xff340f0 size:678960] (line:929 name:qv3d)
alloc done with devptr at 0x10c60000
pgi_uacc_pin(devptr=0x0,hostptr=0xff340f0,poffset=0,offset=0,stride=1,size=169740,extent=169740,eltsize=4,lineno=929,name=qv3d,flags=0x0,threadid=1)
MemHostRegister( 0xff340f0, 678960, 0 )
pgi_uacc_dataupx(devptr=0x10c60000,hostptr=0xff340f0,poffset=0,offset=0,stride=1,size=169740,extent=169740,eltsize=4,lineno=929,name=qv3d,flags=0x0,threadid=1)
pgi_uacc_cuda_dataup1(devdst=0x10c60000,hostsrc=0xff340f0,offset=0,stride=1,size=169740,eltsize=4,lineno=929,name=qv3d,thread=0)
     upload 0xff340f0->0x10c60000 for 678960 bytes stream (nil) thread 0
pgi_uacc_datadone( async=-1, devid=1 )
pgi_uacc_cuda_wait(lineno=-1,async=-1,dindex=1)
pgi_uacc_cuda_wait(sync on stream=(nil))
pgi_uacc_cuda_wait done
pgi_uacc_begin( compute region, file=/home/arom/aaa/phys/module_mp_thompson.f90, function=mp_thompson, lines=867:2304, startline=964, endline=2302, devid=0, threadid=1 )
pgi_uacc_begin( file=/home/arom/aaa/phys/module_mp_thompson.f90, function=mp_thompson, lines=867:2304, startline=964, endline=2302, devid=1, threadid=1 ) dindex=1
pgi_uacc_enter( devid=1 )

entire file is sent to trs@…


Alexey

Hi Alexey,

I asked Michael. What’s happening that a kernel isn’t getting generated for some reason, but the runtime is still trying to launch it. I’ll get your code from TRS and see what’s going on.

  • Mat

Hi Mat,

It’s really so. I got

PGF90-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (module_mp_thompson.f90: 964)

Putting extra options shows

PGF90-I-0035-Predefined intrinsic sum loses intrinsic property (module_mp_thompson.f90: 2782)

I think it’s Ok

but what about

executing /home/opt/pgi/linux86/13.6/bin/pgnvd module_mp_thompson.n001.gpu -computecap=13 -ptx /tmp/pgaccWtOdq6sZWU49.ptx -o /tmp/pgaccWtOdqmQLXK_0.bin -ptxinfo /tmp/WtOdqwG0nedr.info -4.2
executing /home/opt/pgi/linux86/13.6/bin/pgnvd module_mp_thompson.n001.gpu -computecap=20 -ptx /tmp/pgaccWtOdqU7QWkZK.ptx -o /tmp/pgaccWtOdqGzAXGmt.bin -ptxinfo /tmp/WtOdqiJWnElf.info -4.2
executing /home/opt/pgi/linux86/13.6/bin/pgnvd module_mp_thompson.n001.gpu -computecap=30 -ptx /tmp/pgaccWtOdqILfWMW0.ptx -o /tmp/pgaccWtOdq0rkXC1I.bin -ptxinfo /tmp/WtOdqk8mnywA.info -4.2
PGF90-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (module_mp_thompson.f90: 964)
PGF90-I-0167-Inconsistent size of common block _module_mp_thompson$0 (module_mp_thompson.f90)
mp_thompson:
    929, Generating present(prg_ihm(:,:,:))
         Generating present(prg_rcg(:,:,:))
         Generating present(prg_rcs(:,:,:))
         Generating present(prg_rci(:,:,:))

With -ta=nvidia,cuda5.0 I got

executing /home/opt/pgi/linux86/13.6/bin/pgnvd module_mp_thompson.n001.gpu -computecap=13 -ptx /tmp/pgaccToUdhkY5vEgp.ptx -o /tmp/pgaccvoUd-AJs8qAN.bin -ptxinfo /tmp/DoUdx47Z4VbX.info -5.0
executing /home/opt/pgi/linux86/13.6/bin/pgnvd module_mp_thompson.n001.gpu -computecap=20 -ptx /tmp/pgacc9oUd3YEfLym_.ptx -o /tmp/pgaccLoUdVj1dmDpN.bin -ptxinfo /tmp/foUdp761q3cf.info -5.0
executing /home/opt/pgi/linux86/13.6/bin/pgnvd module_mp_thompson.n001.gpu -computecap=30 -ptx /tmp/pgaccnoUdNx_i1gRu.ptx -o /tmp/pgacc1oUdFurRC9Jj.bin -ptxinfo /tmp/ToUdhvYBO5K1.info -5.0
executing /home/opt/pgi/linux86/13.6/bin/pgnvd -fat dummy.c -sm13 /tmp/pgaccvoUd-AJs8qAN.bin -sm20 /tmp/pgaccLoUdVj1dmDpN.bin -sm30 /tmp/pgacc1oUdFurRC9Jj.bin -compute30 /tmp/pgaccnoUdNx_i1gRu.ptx -5.0 -o /tmp/pgaccvoUd-9KKD4En.fat
PGF90-I-0167-Inconsistent size of common block _module_mp_thompson$0 (module_mp_thompson.f90)

Code was compiled but result is not correct.


Alexey

Putting extra options shows
Code:

PGF90-I-0035-Predefined intrinsic sum loses intrinsic property (module_mp_thompson.f90: 2782)


Likely there is a variable called “sum” that clashes with the name of the intrinsic function. The code should work correctly but you get this warning.