CUDA 1.1 Bug - Compiler crash (ptxas) w/repro

After upgrading to CUDA 1.1 (drivers, fresh runtime install, fresh SDK install), my code started crashing the compiler. Specifically when nvcc’ing one of my kernels, ptxas crashes.

I’ve narrowed it down to the following code - running nvcc on this file crashes every time.

__global__ void calculate_solvent_accessibility_ligand_receptor_kernel( float* d_solvent_accessibilities,

                                                                        uint start,

                                                                        uint start_receptor_block,

                                                                        uint num_receptor_blocks,

                                                                        float* d_ligand_atomic_radii,

                                                                        float* d_ligand_p_i,

                                                                        float* d_ligand_S_i,

                                                                        float3* d_ligand_origin,

                                                                        float3* d_ligand_transform_origin,

                                                                        float* d_receptor_atomic_radii,

                                                                        float* d_receptor_p_i,

                                                                        float* d_receptor_S_i,

                                                                        float3* d_receptor_origin

                                                                        )

{

}

When compiling the above verbatim (e.g. nothing in the function body, just the specified function name and parameter list), I get:

[dynerman-local@troy cuda]$ nvcc crash_1.1.cu 

nvcc error  : 'ptxas' died due to signal 11 (Invalid memory reference)

If I modify the code by removing some of the parameters, the file eventually gets pass ptxas. Also, the problem becomes more intermittant. If I remove the last 5 parameters, it’ll crash (with the same message) about 10% of the time (the other 90% it’ll get to the linking phase). Once I add the 5 last parameters back in, the crash rate is back near 100%.

Note that this code will fail to link, the crash is occuring at an earlier compilation stage.

My system specifications:

Operating System: RHEL 5.1 64-bit

Synopsis of the problem: nvcc crashes on kernel compile during ptxas stage

Detailed description: above

CUDA Toolkit Release Version: 1.1

SDK Release Version: 1.1

Compiler for CPU Host Code: N/A

System description: Dell Precision 690
[dynerman-local@troy ~]$ nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2006 NVIDIA Corporation

Built on Fri_Nov_30_01:55:35_PST_2007

Cuda compilation tools, release 1.1, V0.2.1221
09:00.0 VGA compatible controller: nVidia Corporation GeForce 8800 Ultra (rev a2)

0a:00.0 VGA compatible controller: nVidia Corporation GeForce 8800 Ultra (rev a2)
processor       : 0-3

vendor_id       : GenuineIntel

cpu family      : 6

model           : 15

model name      : Intel(R) Xeon(R) CPU           E5335  @ 2.00GHz

stepping        : 7

cpu MHz         : 1995.006

cache size      : 4096 KB

physical id     : 0

siblings        : 4

core id         : 0

cpu cores       : 4

fpu             : yes

fpu_exception   : yes

cpuid level     : 10

wp              : yes

flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx tm2 cx16 xtpr lahf_lm

bogomips        : 3992.91

clflush size    : 64

cache_alignment : 64

address sizes   : 36 bits physical, 48 bits virtual

power management:

As per my limited knowledge of the architecture, I think you are running out of registers. I remember reading that the arguments to the functions are passed in to the registers, what I do not remember is the number of registers each MP gets.
Since passing lesser parameters seems to solve your problem…

Are you also declaring any variable in the kernel? Since these would need registers too!

Assuming that I am right about the arguments being in the registers, you have already used 12 registers…

To clarify a bit:

  1. The above code, verbatim, crashes the compiler. I didn’t remove the body of the function - the function listed above, a noop with the specified parameters, crashes the compiler.

  2. The code worked and compiled fine with the 1.0 toolchain

Are you on a 64bit or 32bit system?

I tried your code on RHEL4 32bit and it compiled ok.

$ nvcc --version
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2006 NVIDIA Corporation
Built on Wed_Nov__7_05:31:37_PST_2007
Cuda compilation tools, release 1.1, V0.2.1221

64-bit - I’ll add this to the original post.

my bad…

gotcha! External Media

And parameters are passed in shared memory, not registers.

Update: It’s possible this has to do with the length of the variable names. By reducing the variable names of the parameters in the above code to 3-4 letters each, the crash does NOT occur.

__global__ void calculate_solvent_accessibility_ligand_receptor_kernel( float* d,

                                                                        uint start,

                                                                        uint s_r_b,

                                                                        uint n_r_b,

                                                                        float* d_lar,

                                                                        float* d_lp,

                                                                        float* d_lS,

                                                                        float3* d_lo,

                                                                        float3* d_lto,

                                                                        float* d_rar,

                                                                        float* d_rp,

                                                                        float* d_rS,

                                                                        float3* d_ro

                                                                        )

{

}

Hi, my guess would be exactly the same. I have faced problems with functions having long unmangled names in both NVCC 1.0 as 1.1. For me it seemed the backend stage of the compilation crashed when an unmangled identifier name became more thant ~256 characters long.

It took a number of template arguments and function parameters to get the function name this long but in my opinion the arbitrary length limit of the identifier name could well be ten times larger.

PS. I have been using a 32-bit system.

/Pyry

Hi,

Does anyone know what the status on this bug is? I’m experiencing exactly the same thing with the kernel below. Removing a single argument (seems it doesn’t matter which one) or shortening them to three letters makes it compile. I’m running CUDA 1.1 on 64-bit CentOS 5.

Cheers,

Tobias.

__global__ void inbcond_kernel(

int sb_npi, 

int sb_npcbi,

int *sb_patch_index,

float rfin,

float cp,

float cv,

float ga1,

float rgas,

float *sb_pv_t0,

float *sb_pv_p0,

float *sb_pv_angle,

float *sb_bp_ro,

float *buffer_ro,

float *buffer_ext_ro,

float *sb_bp_rovx,

float *buffer_rovx,

float *buffer_ext_rovx,

float *sb_bp_rovr,

float *buffer_rovr,

float *buffer_ext_rovr,

float *sb_bp_rorvt,

float *buffer_rorvt,

float *buffer_ext_rorvt,

float *sb_bp_roe,

float *buffer_roe,

float *buffer_ext_roe,

float *sb_bp_r,

float *buffer_r,

float *buffer_ext_r)

{   

}

nvcc error : ‘ptxas’ died due to signal 11 (Invalid memory reference)

no errors seen on winXP and cuda1.1 and MS VS2005

nvcc.exe -ccbin “C:\Program Files\Microsoft Visual Studio 8\VC\bin” -c -DWIN32 -D_DEBUG -D_CONSOLE -Xcompiler “/EHsc /W3 /nologo /Wp64 /Od /Zi /MDd /GR” -I"C:\Program Files\NVIDIA Corporation\NVIDIA SDK 10\NVIDIA CUDA SDK\common\inc" -I"C:\CUDA\include" --opencc-options -LIST:source=on -keep tst.cu

tst.obj created

I will try on fedora 8 i386_32 later …

works fine with::

[root@gk-216c900ee6fc cudatst]# nvcc --version

nvcc: NVIDIA ® Cuda compiler driver

Copyright © 2005-2006 NVIDIA Corporation

Built on Fri_Nov_30_07:11:42_PST_2007

Cuda compilation tools, release 1.1, V0.2.1221

[root@gk-216c900ee6fc cudatst]# uname -a

Linux gk-216c900ee6fc.lan 2.6.24.3-12.fc8 #1 SMP Tue Feb 26 14:58:29 EST 2008 i686 athlon i386 GNU/Linux

[root@gk-216c900ee6fc cudatst]# gcc --version

gcc (GCC) 4.3.0 20080307 (Red Hat 4.3.0-2)

Copyright © 2008 Free Software Foundation, Inc.

This is free software; see the source for copying conditions. There is NO

warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

nvcc -c -I"/usr/local/cuda/include" -I"/root/NVIDIA_CUDA_SDK/common/inc" --opencc-options -LIST:source=on -keep tst.cu

I have the same bug, code compiles fine on 32bit system but “nvcc error : ‘ptxas’ died due to signal 11 (Invalid memory reference)” on 64bit system.
Is this fixed in Cuda 2.0 ? I can’t test it as my distrib is fedora 8, not yet supported.
I know it can be fixed by shortening variable names, but with a big code it’s a laborious task.

I have another question : I have another code which compiles fine but crash during execution on 64 bit, whereas it runs fine in 32 bit system; could this be a related bug ?

@Kravell
The only way that someone can tell you whether its fixed for CUDA_2.0 is if you provide a test app which reproduces the problem.

A testcase which reproduces this bug, is the CUDPP 1.0a library.

I had the same trouble: Internal Identifiers get to long, ptx crashed with the Signal 11 message. Making a global sed s/SegmentedScanTraits/SST helped fixing it. But because I have also just 1.1 I can say nothing if it is still in CUDA 2.0 present

If you’d like further assistance, please provide detailed instructions on how to reproduce this problem.

Well, as Flolo just said you can easily reproduce the problem with the CUDPP 1.0a library which fails to compile with the same error (“nvcc error : ‘ptxas’ died due to signal 11 (Invalid memory reference)”) in a 64bit system with CUDA1.1.

Yet I assume this is fixed with CUDA2.0 as they say in the CUDPP documentation that CUDPP has been thoroughly tested on Redhat Enterprise Linux 5 (RHEL 5 x86_64, 64-bit) (CUDA 2.0).

Thanks anyway for your proposed help.