Segfault from pgf902

I’ve been working around my type error and think I’ve solved the issue. However, having moved on from there, I attempted another compile and segfaulted during compile. Much of the error just looks like a mess to me, but I do recognize several pieces as portions of the Makefile I’m using.

I did recently add a device allocatable variable, allocate and deallocate statements, though I removed them and replaced them with dummy static arguments and the segfault was still thrown, so I don’t believe this is the source of the problem.

pgfortran-Fatal-/usr/local/pgi105/linux86-64/10.5/bin/pgf902 TERMINATED by signal 11
Arguments to /usr/local/pgi105/linux86-64/10.5/bin/pgf902
/usr/local/pgi105/linux86-64/10.5/bin/pgf902 /tmp/pgfortranblqhd19vbXfO.ilm -fn cuda_drv.f90 -opt 2 -version -terse 1 -inform warn -x 51 0x20 -x 119 0xa10000 -x 122 0x40 -x 123 0x1000 -x 127 4 -x 127 17 -x 19 0x400000 -x 28 0x40000 -x 120 0x10000000 -x 70 0x8000 -x 122 1 -quad -vect 56 -y 34 16 -x 34 0x8 -x 32 6291456 -y 19 8 -y 35 0 -x 42 0x30 -x 39 0x40 -x 39 0x80 -x 34 0x400000 -x 149 1 -x 150 1 -x 59 4 -x 59 4 -tp penryn-64 -x 120 0x1000 -x 124 0x1400 -y 15 2 -x 57 0x3b0000 -x 58 0x48000000 -x 49 0x100 -x 120 0x200 -astype 0 -x 121 1 -x 124 1 -x 9 1 -x 42 0x14200000 -x 72 0x1 -x 136 0x11 -x 80 0x800000 -quad -x 119 0x10000000 -x 129 0x40000000 -x 129 2 -x 164 0x1000 -x 9 1 -x 10 4 -x 89 2961677122 -x 66 84090906 -x 14 308 -y 89 0x40 -x 137 1 -x 176 1 -cmdline ‘+pgfortran cuda_drv.f90 -I …/libcore -I …/libturb -I …/PHYSICS_MODULES -I …/cuda_src -Bstatic_pgi -V -fastsse -fast -Mvect=sse -Mscalarsse -Mcache_align -Mflushz -Mpre -Munroll=n:4 -Mipa=fast,inline -Mcuda -c -o cuda_drv.o’ -exfile /tmp/pgfortranXlqht0vCBPDJ.ipn -exifile /tmp/pgfortran5lqhRwKB5XXh.ipm -inlfile /tmp/pgfortranblqhdmN4-rjq.ipk -asm /tmp/pgfortranPlqh7mu5vtaa.sm
make[2]: *** [cuda_drv.o] Error 127

Edit: I’ve tested this and its not what is causing the segfault, but as a little bonus f-my-i question: At what point do constants become constant? I define them at the module level and initialize them in my host function. Are the uneditable after the first assignment statement, or are they uneditable only in device/kernel code? Thanks.

Hi rmsivley,

Try using 10.6 which just came out. It’s possible that this problem was found and fixed already. If not, then please send the core file to PGI Customer Service (trs@pgroup.com). A reproducing example would be better, but I know that this not an option

Thanks,
Mat

Having our system admin upgrade to 10.6 now. I know several of the issues I’ve had you’ve mentioned possible/planned improvements in this update so hopefully this is a good fixall! Thanks.

I upgraded to 10.6 and the segfault did disappear. But only to make room for two other errors equally as baffling.

The first, I’m receiving the following:
PGF90-S-0000-Internal compiler error. unexpected runtime function call 0 (cuda_drv.f90: 672)
…twice in a row. Line 672 is simply an “end subroutine drva” statement.

Possibly caused by the previous error, but generating 27 errors of its own, is this:
/tmp/pgcudaforpL4bTmum7tmW.gpu(75): error: identifier “mm4” is undefined
/tmp/pgcudaforpL4bTmum7tmW.gpu(76): error: identifier “mm8” is undefined
/tmp/pgcudaforpL4bTmum7tmW.gpu(77): error: identifier “mm12” is undefined
/tmp/pgcudaforpL4bTmum7tmW.gpu(82): error: identifier “mm52” is undefined
/tmp/pgcudaforpL4bTmum7tmW.gpu(83): error: identifier “mm48” is undefined


/tmp/pgcudaforDB8bx4pK5CFY.gpu(578): error: expected an expression

27 errors detected in the compilation of “/tmp/pgnvdlM4bH0ZdOnJL.nv0”.
PGF90-F-0000-Internal compiler error. pgnvd job exited with nonzero status code 0 (cuda_drv.f90: 672)

Any thoughts? possibly a bug that thinks I’m trying to call the subroutine I’m ending? If it matters this statement is immediately followed by the ‘end module cuda_drv’ statement.

I was reading through old posts on this forum and I may have found a solution. Is there a specific order in which subroutines need to be defined? Such as: device, kernel, host? When I comment out all calls to my device subroutine, the long series of errors is resolved and leaves just the:
PGF90-F-0000-Internal compiler error. pgnvd job exited with nonzero status code 0 (cuda_drv.f90: 673)
…error.

Should I reorder my definitions in some way? There is no special syntax for calling a device subroutine from a kernel, correct? Thanks

Should I reorder my definitions in some way? There is no special syntax for calling a device subroutine from a kernel, correct?

No, but you do need to have explicit interfaces for each of the routines (which is automatic if you’re using modules). Can you post your routine and variable definitions? (Changing the names is fine)


I’ve tested this and its not what is causing the segfault, but as a little bonus f-my-i question: At what point do constants become constant? I define them at the module level and initialize them in my host function. Are the uneditable after the first assignment statement, or are they uneditable only in device/kernel code? Thanks.

Sorry I missed this early question. Constants are writable from the host and can be written to even after initialization. They can only be read from the device.

  • Mat

Everything is set up in a module already, so that shouldn’t be the problem. I will check with my supervisor when he gets in and see if there is any way I could get a sample of this source to you.

Also, left for the weekend and came back, and now the:
Internal compiler error. unexpected runtime function call
…error is the only one left. I checked some previous forum threads and a few months ago you mentioned that it was caused by trying to define a local array with a runtime-initialized variable, which would require local thread memory allocation.

This may or may not be my problem, but I just wanted to clarify. If an array is declared with the allocatable attribute, can it only be declared in device, constant or shared? Would device/constant/shared require the allocatable attribute or could they be defined with a passed-in variable?

Our server is down for maintenance right now but I am in the middle of generalizing the code to send to technical support, but trying to cover some of these concepts in the meantime.

Hi rmsivley,

I will check with my supervisor when he gets in and see if there is any way I could get a sample of this source to you.

That would be great.

If an array is declared with the allocatable attribute, can it only be declared in device, constant or shared?

A subroutine with a global or device attribute (i.e. a device kernel) can only use device allocatable arrays that are either passed in from the host or declared in the module data. The “constant” attribute can only be applied to a fixed size array (or scalar) within a module’s data section. The “shared” attribute can only be applied to a fixed size array within a device kernel.

Would device/constant/shared require the allocatable attribute or could they be defined with a passed-in variable?

No, automatic arrays are not allowed so you need to pass in device allocatable arrays.

The basic problem is that dynamic allocation is not allowed on a GPU (this is limitation of CUDA and not specific to CUDA Fotran). Hence, allocate and automatic arrays, which implicitly require an allocate, are not allowed.

Hope this helps,
Mat

I think that might have revealed the problem, but I’m not certain. The way I’m having to convert this, I’m importing all necessary variables at the module level. I am then declaring new device variables to which I’ll transfer their values. Then I assign the value of the imported value to the device value. Some of these are arrays, and the imported variables are not static. However, this operation all takes place at the module and host level. I’ll copy a few sample statements below:

module name
use module, only : h_n_energy_j
integer, device, dimension(:) :: n_energy_j

contains

attributes(host) subroutine name(a,b,c,d)
n_energy_j = h_n_energy_j

end subroutine name

attributes(global) subroutine name(a,b,c,d)
jibber_jabber = 5 * n_energy_j(10)

end subroutine name
end module name

Do you see any issues with that basic flow? I have several variables which follow the same pattern. Also a note that I’ve submitted a larger sample of code to examine, but unfortunately I wasn’t given the clearance to send a functional module. Hopefully they’ll still be able to help out.

Hi Michael,

Customer support sent me your code example.

PGF90-F-0000-Internal compiler error. pgnvd job exited with nonzero status code 0

This being caused by your use of a string comparison, “if (mystrr == “hello”)”. This comparison will get converted into a call to a host runtime library function, which is not supported. Our engineers are aware of this issue and are working on adding the appropriate error message.

/tmp/pgcudaforpL4bTmum7tmW.gpu(75): error: identifier “mm4” is undefined

You’re missing declarations for these variables.

Unfortunately, I was not able to recreate the seg fault with the 10.5 compiler so it was most likely due to some other portion of code.

Do you see any issues with that basic flow?

The flow looks fine, though you’ll want to allocate “n_energy_j” before copying over the data.

module *name*
use *module*, only : h_n_energy_j
integer, device, allocatable, dimension(:) :: n_energy_j
...
contains
...
attributes(host) subroutine *name*(a,b,c,d)
integer esize
esize = size(h_n_energy_j)
allocate(n_energy_j(esize))
n_energy_j = h_n_energy_j
...
end subroutine *name*

Hope this helps,
Mat

I’ve been looking through the code for those missing declarations. I don’t see anything that isn’t explicitly defined, and I’m working with “implicit none”. What I find confusing is that the variables reported in the error are not mine. They appear to be some generalized copy, possibly indicating bit length as well? They all appear to be 4 units apart except for one gap. Possibly some set of ~11 4byte integers are to blame somewhere in my code?

Another thought. Is there any issue with accepting variables as parameters in a kernel, and then passing those same variables as arguments to a device subroutine?

Because the second half of the error message goes away only after I comment out both calls (and by association ALL of the routines parameters passed as arguments), I figure it must be something to do with these variables. However, they are defined just after the parameter list and “implicit none” statement.

Edit: One more thought. To copy arrays to the device, I am declaring device arrays with equivalent rank but no shape and then assigning the host copy to the device variable. All takes place in the host code. Nowhere do I explicitly allocate the device memory. However, I tried to allocate using the following code:

real(dp), device, allocatable, dimension(:) :: dq


allocate(dq(shape(h_dq)))
dq = h_dq

and I was given the error:
PGF90-S-0083-Vector expression used where scalar expression required (cuda_drv.f90: 203)
which I know references the non-static argument to allocate.

What is the ‘official’ way that I should be transferring these arrays to device? Thanks.

Because the second half of the error message goes away only after I comment out both calls (and by association ALL of the routines parameters passed as arguments), I figure it must be something to do with these variables. However, they are defined just after the parameter list and “implicit none” statement.

All calls to routines with the device attribute need to be inlined into the kernel since calling is not allowed on a GPU. (no stack or context switching). Perhaps something isn’t inlining correctly and the new inlined variable names aren’t being correctly declared. Why? I’m not sure and my attempts to recreate the problem here haven’t succeeded.


allocate(dq(shape(h_dq)))

The error message is correct since “shape” returns an array and allocate expects a scalar value for the number of elements in the array. Did you mean to use ‘size’ here?

  • Mat