Segfault from pgf902

rmsivley · June 25, 2010, 1:07pm

I’ve been working around my type error and think I’ve solved the issue. However, having moved on from there, I attempted another compile and segfaulted during compile. Much of the error just looks like a mess to me, but I do recognize several pieces as portions of the Makefile I’m using.

I did recently add a device allocatable variable, allocate and deallocate statements, though I removed them and replaced them with dummy static arguments and the segfault was still thrown, so I don’t believe this is the source of the problem.

pgfortran-Fatal-/usr/local/pgi105/linux86-64/10.5/bin/pgf902 TERMINATED by signal 11
Arguments to /usr/local/pgi105/linux86-64/10.5/bin/pgf902
/usr/local/pgi105/linux86-64/10.5/bin/pgf902 /tmp/pgfortranblqhd19vbXfO.ilm -fn cuda_drv.f90 -opt 2 -version -terse 1 -inform warn -x 51 0x20 -x 119 0xa10000 -x 122 0x40 -x 123 0x1000 -x 127 4 -x 127 17 -x 19 0x400000 -x 28 0x40000 -x 120 0x10000000 -x 70 0x8000 -x 122 1 -quad -vect 56 -y 34 16 -x 34 0x8 -x 32 6291456 -y 19 8 -y 35 0 -x 42 0x30 -x 39 0x40 -x 39 0x80 -x 34 0x400000 -x 149 1 -x 150 1 -x 59 4 -x 59 4 -tp penryn-64 -x 120 0x1000 -x 124 0x1400 -y 15 2 -x 57 0x3b0000 -x 58 0x48000000 -x 49 0x100 -x 120 0x200 -astype 0 -x 121 1 -x 124 1 -x 9 1 -x 42 0x14200000 -x 72 0x1 -x 136 0x11 -x 80 0x800000 -quad -x 119 0x10000000 -x 129 0x40000000 -x 129 2 -x 164 0x1000 -x 9 1 -x 10 4 -x 89 2961677122 -x 66 84090906 -x 14 308 -y 89 0x40 -x 137 1 -x 176 1 -cmdline ‘+pgfortran cuda_drv.f90 -I …/libcore -I …/libturb -I …/PHYSICS_MODULES -I …/cuda_src -Bstatic_pgi -V -fastsse -fast -Mvect=sse -Mscalarsse -Mcache_align -Mflushz -Mpre -Munroll=n:4 -Mipa=fast,inline -Mcuda -c -o cuda_drv.o’ -exfile /tmp/pgfortranXlqht0vCBPDJ.ipn -exifile /tmp/pgfortran5lqhRwKB5XXh.ipm -inlfile /tmp/pgfortranblqhdmN4-rjq.ipk -asm /tmp/pgfortranPlqh7mu5vtaa.sm
make[2]: *** [cuda_drv.o] Error 127

Edit: I’ve tested this and its not what is causing the segfault, but as a little bonus f-my-i question: At what point do constants become constant? I define them at the module level and initialize them in my host function. Are the uneditable after the first assignment statement, or are they uneditable only in device/kernel code? Thanks.

MatColgrove · June 25, 2010, 3:00pm

Hi rmsivley,

Try using 10.6 which just came out. It’s possible that this problem was found and fixed already. If not, then please send the core file to PGI Customer Service (trs@pgroup.com). A reproducing example would be better, but I know that this not an option

Thanks,
Mat

rmsivley · June 25, 2010, 3:16pm

Having our system admin upgrade to 10.6 now. I know several of the issues I’ve had you’ve mentioned possible/planned improvements in this update so hopefully this is a good fixall! Thanks.

rmsivley · June 25, 2010, 3:50pm

I upgraded to 10.6 and the segfault did disappear. But only to make room for two other errors equally as baffling.

The first, I’m receiving the following:
PGF90-S-0000-Internal compiler error. unexpected runtime function call 0 (cuda_drv.f90: 672)
…twice in a row. Line 672 is simply an “end subroutine drva” statement.

Possibly caused by the previous error, but generating 27 errors of its own, is this:
/tmp/pgcudaforpL4bTmum7tmW.gpu(75): error: identifier “mm4” is undefined
/tmp/pgcudaforpL4bTmum7tmW.gpu(76): error: identifier “mm8” is undefined
/tmp/pgcudaforpL4bTmum7tmW.gpu(77): error: identifier “mm12” is undefined
/tmp/pgcudaforpL4bTmum7tmW.gpu(82): error: identifier “mm52” is undefined
/tmp/pgcudaforpL4bTmum7tmW.gpu(83): error: identifier “mm48” is undefined
…
…
/tmp/pgcudaforDB8bx4pK5CFY.gpu(578): error: expected an expression

27 errors detected in the compilation of “/tmp/pgnvdlM4bH0ZdOnJL.nv0”.
PGF90-F-0000-Internal compiler error. pgnvd job exited with nonzero status code 0 (cuda_drv.f90: 672)

Any thoughts? possibly a bug that thinks I’m trying to call the subroutine I’m ending? If it matters this statement is immediately followed by the ‘end module cuda_drv’ statement.

rmsivley · June 25, 2010, 7:38pm

I was reading through old posts on this forum and I may have found a solution. Is there a specific order in which subroutines need to be defined? Such as: device, kernel, host? When I comment out all calls to my device subroutine, the long series of errors is resolved and leaves just the:
PGF90-F-0000-Internal compiler error. pgnvd job exited with nonzero status code 0 (cuda_drv.f90: 673)
…error.

Should I reorder my definitions in some way? There is no special syntax for calling a device subroutine from a kernel, correct? Thanks

MatColgrove · June 25, 2010, 11:18pm

Should I reorder my definitions in some way? There is no special syntax for calling a device subroutine from a kernel, correct?

No, but you do need to have explicit interfaces for each of the routines (which is automatic if you’re using modules). Can you post your routine and variable definitions? (Changing the names is fine)

I’ve tested this and its not what is causing the segfault, but as a little bonus f-my-i question: At what point do constants become constant? I define them at the module level and initialize them in my host function. Are the uneditable after the first assignment statement, or are they uneditable only in device/kernel code? Thanks.

Sorry I missed this early question. Constants are writable from the host and can be written to even after initialization. They can only be read from the device.

Mat

rmsivley · June 28, 2010, 12:45pm

Everything is set up in a module already, so that shouldn’t be the problem. I will check with my supervisor when he gets in and see if there is any way I could get a sample of this source to you.

Also, left for the weekend and came back, and now the:
Internal compiler error. unexpected runtime function call
…error is the only one left. I checked some previous forum threads and a few months ago you mentioned that it was caused by trying to define a local array with a runtime-initialized variable, which would require local thread memory allocation.

This may or may not be my problem, but I just wanted to clarify. If an array is declared with the allocatable attribute, can it only be declared in device, constant or shared? Would device/constant/shared require the allocatable attribute or could they be defined with a passed-in variable?

Our server is down for maintenance right now but I am in the middle of generalizing the code to send to technical support, but trying to cover some of these concepts in the meantime.

MatColgrove · June 28, 2010, 5:02pm

Hi rmsivley,

I will check with my supervisor when he gets in and see if there is any way I could get a sample of this source to you.

That would be great.

If an array is declared with the allocatable attribute, can it only be declared in device, constant or shared?

A subroutine with a global or device attribute (i.e. a device kernel) can only use device allocatable arrays that are either passed in from the host or declared in the module data. The “constant” attribute can only be applied to a fixed size array (or scalar) within a module’s data section. The “shared” attribute can only be applied to a fixed size array within a device kernel.

Would device/constant/shared require the allocatable attribute or could they be defined with a passed-in variable?

No, automatic arrays are not allowed so you need to pass in device allocatable arrays.

The basic problem is that dynamic allocation is not allowed on a GPU (this is limitation of CUDA and not specific to CUDA Fotran). Hence, allocate and automatic arrays, which implicitly require an allocate, are not allowed.

Hope this helps,
Mat

rmsivley · June 28, 2010, 7:08pm

I think that might have revealed the problem, but I’m not certain. The way I’m having to convert this, I’m importing all necessary variables at the module level. I am then declaring new device variables to which I’ll transfer their values. Then I assign the value of the imported value to the device value. Some of these are arrays, and the imported variables are not static. However, this operation all takes place at the module and host level. I’ll copy a few sample statements below:

module name
use module, only : h_n_energy_j
integer, device, dimension(:) :: n_energy_j
…
contains
…
attributes(host) subroutine name(a,b,c,d)
n_energy_j = h_n_energy_j
…
end subroutine name

attributes(global) subroutine name(a,b,c,d)
jibber_jabber = 5 * n_energy_j(10)
…
end subroutine name
end module name

Do you see any issues with that basic flow? I have several variables which follow the same pattern. Also a note that I’ve submitted a larger sample of code to examine, but unfortunately I wasn’t given the clearance to send a functional module. Hopefully they’ll still be able to help out.

MatColgrove · June 28, 2010, 8:15pm

Hi Michael,

Customer support sent me your code example.

PGF90-F-0000-Internal compiler error. pgnvd job exited with nonzero status code 0

This being caused by your use of a string comparison, “if (mystrr == “hello”)”. This comparison will get converted into a call to a host runtime library function, which is not supported. Our engineers are aware of this issue and are working on adding the appropriate error message.

/tmp/pgcudaforpL4bTmum7tmW.gpu(75): error: identifier “mm4” is undefined

You’re missing declarations for these variables.

Unfortunately, I was not able to recreate the seg fault with the 10.5 compiler so it was most likely due to some other portion of code.

Do you see any issues with that basic flow?

The flow looks fine, though you’ll want to allocate “n_energy_j” before copying over the data.

module *name*
use *module*, only : h_n_energy_j
integer, device, allocatable, dimension(:) :: n_energy_j
...
contains
...
attributes(host) subroutine *name*(a,b,c,d)
integer esize
esize = size(h_n_energy_j)
allocate(n_energy_j(esize))
n_energy_j = h_n_energy_j
...
end subroutine *name*

Hope this helps,
Mat

rmsivley · June 29, 2010, 2:09pm

I’ve been looking through the code for those missing declarations. I don’t see anything that isn’t explicitly defined, and I’m working with “implicit none”. What I find confusing is that the variables reported in the error are not mine. They appear to be some generalized copy, possibly indicating bit length as well? They all appear to be 4 units apart except for one gap. Possibly some set of ~11 4byte integers are to blame somewhere in my code?

rmsivley · June 29, 2010, 5:42pm

Another thought. Is there any issue with accepting variables as parameters in a kernel, and then passing those same variables as arguments to a device subroutine?

Because the second half of the error message goes away only after I comment out both calls (and by association ALL of the routines parameters passed as arguments), I figure it must be something to do with these variables. However, they are defined just after the parameter list and “implicit none” statement.

Edit: One more thought. To copy arrays to the device, I am declaring device arrays with equivalent rank but no shape and then assigning the host copy to the device variable. All takes place in the host code. Nowhere do I explicitly allocate the device memory. However, I tried to allocate using the following code:

real(dp), device, allocatable, dimension(:) :: dq
…
…
allocate(dq(shape(h_dq)))
dq = h_dq

and I was given the error:
PGF90-S-0083-Vector expression used where scalar expression required (cuda_drv.f90: 203)
which I know references the non-static argument to allocate.

What is the ‘official’ way that I should be transferring these arrays to device? Thanks.

MatColgrove · June 30, 2010, 10:13pm

Because the second half of the error message goes away only after I comment out both calls (and by association ALL of the routines parameters passed as arguments), I figure it must be something to do with these variables. However, they are defined just after the parameter list and “implicit none” statement.

All calls to routines with the device attribute need to be inlined into the kernel since calling is not allowed on a GPU. (no stack or context switching). Perhaps something isn’t inlining correctly and the new inlined variable names aren’t being correctly declared. Why? I’m not sure and my attempts to recreate the problem here haven’t succeeded.

allocate(dq(shape(h_dq)))

The error message is correct since “shape” returns an array and allocate expects a scalar value for the number of elements in the array. Did you mean to use ‘size’ here?

Mat

Topic		Replies	Views
Error running simple CUDA Fortran program Legacy PGI Compilers	9	21386	February 26, 2010
unexpected runtime function call Legacy PGI Compilers	6	5430	November 12, 2010
CUD Fortran - Device allocatable variable in and c_f_pointer Legacy PGI Compilers	2	3707	April 15, 2011
Errors while compiling a CUDA fortran code in pgi-10.8 Legacy PGI Compilers	2	2495	November 24, 2010
Problem with Using Allocatable Device Variable inside Module Legacy PGI Compilers	6	4265	May 13, 2019
program got SIGSEGV on pgi_acc internal function call Legacy PGI Compilers	5	6321	July 16, 2013
info about compiler error message Legacy PGI Compilers	3	1556	November 9, 2018
Trouble Getting Started CUDA/PGI Fortran Legacy PGI Compilers	30	16101	November 9, 2012
Urgent Help Needed: Constant memory issue Legacy PGI Compilers	1	3703	August 26, 2010
not enough memory Legacy PGI Compilers	12	9567	December 27, 2010

Segfault from pgf902

Related topics