CUDA 1.1 Bug - Compiler crash (ptxas) w/repro

dynerman · December 10, 2007, 5:59pm

After upgrading to CUDA 1.1 (drivers, fresh runtime install, fresh SDK install), my code started crashing the compiler. Specifically when nvcc’ing one of my kernels, ptxas crashes.

I’ve narrowed it down to the following code - running nvcc on this file crashes every time.

__global__ void calculate_solvent_accessibility_ligand_receptor_kernel( float* d_solvent_accessibilities,

                                                                        uint start,

                                                                        uint start_receptor_block,

                                                                        uint num_receptor_blocks,

                                                                        float* d_ligand_atomic_radii,

                                                                        float* d_ligand_p_i,

                                                                        float* d_ligand_S_i,

                                                                        float3* d_ligand_origin,

                                                                        float3* d_ligand_transform_origin,

                                                                        float* d_receptor_atomic_radii,

                                                                        float* d_receptor_p_i,

                                                                        float* d_receptor_S_i,

                                                                        float3* d_receptor_origin

                                                                        )

{

}

When compiling the above verbatim (e.g. nothing in the function body, just the specified function name and parameter list), I get:

[dynerman-local@troy cuda]$ nvcc crash_1.1.cu 

nvcc error  : 'ptxas' died due to signal 11 (Invalid memory reference)

If I modify the code by removing some of the parameters, the file eventually gets pass ptxas. Also, the problem becomes more intermittant. If I remove the last 5 parameters, it’ll crash (with the same message) about 10% of the time (the other 90% it’ll get to the linking phase). Once I add the 5 last parameters back in, the crash rate is back near 100%.

Note that this code will fail to link, the crash is occuring at an earlier compilation stage.

My system specifications:

Operating System: RHEL 5.1 64-bit

Synopsis of the problem: nvcc crashes on kernel compile during ptxas stage

Detailed description: above

CUDA Toolkit Release Version: 1.1

SDK Release Version: 1.1

Compiler for CPU Host Code: N/A

System description: Dell Precision 690

[dynerman-local@troy ~]$ nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2006 NVIDIA Corporation

Built on Fri_Nov_30_01:55:35_PST_2007

Cuda compilation tools, release 1.1, V0.2.1221

09:00.0 VGA compatible controller: nVidia Corporation GeForce 8800 Ultra (rev a2)

0a:00.0 VGA compatible controller: nVidia Corporation GeForce 8800 Ultra (rev a2)

processor       : 0-3

vendor_id       : GenuineIntel

cpu family      : 6

model           : 15

model name      : Intel(R) Xeon(R) CPU           E5335  @ 2.00GHz

stepping        : 7

cpu MHz         : 1995.006

cache size      : 4096 KB

physical id     : 0

siblings        : 4

core id         : 0

cpu cores       : 4

fpu             : yes

fpu_exception   : yes

cpuid level     : 10

wp              : yes

flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx tm2 cx16 xtpr lahf_lm

bogomips        : 3992.91

clflush size    : 64

cache_alignment : 64

address sizes   : 36 bits physical, 48 bits virtual

power management:

tanmay.Learns · December 10, 2007, 7:16pm

As per my limited knowledge of the architecture, I think you are running out of registers. I remember reading that the arguments to the functions are passed in to the registers, what I do not remember is the number of registers each MP gets.
Since passing lesser parameters seems to solve your problem…

Are you also declaring any variable in the kernel? Since these would need registers too!

Assuming that I am right about the arguments being in the registers, you have already used 12 registers…

dynerman · December 10, 2007, 7:29pm

To clarify a bit:

The above code, verbatim, crashes the compiler. I didn’t remove the body of the function - the function listed above, a noop with the specified parameters, crashes the compiler.
The code worked and compiled fine with the 1.0 toolchain

mfatica · December 10, 2007, 7:29pm

Are you on a 64bit or 32bit system?

I tried your code on RHEL4 32bit and it compiled ok.

$ nvcc --version
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2006 NVIDIA Corporation
Built on Wed_Nov__7_05:31:37_PST_2007
Cuda compilation tools, release 1.1, V0.2.1221

dynerman · December 10, 2007, 7:30pm

64-bit - I’ll add this to the original post.

tanmay.Learns · December 10, 2007, 7:48pm

my bad…

gotcha! External Media

MisterAnderson42 · December 10, 2007, 7:51pm

And parameters are passed in shared memory, not registers.

dynerman · December 10, 2007, 7:53pm

Update: It’s possible this has to do with the length of the variable names. By reducing the variable names of the parameters in the above code to 3-4 letters each, the crash does NOT occur.

__global__ void calculate_solvent_accessibility_ligand_receptor_kernel( float* d,

                                                                        uint start,

                                                                        uint s_r_b,

                                                                        uint n_r_b,

                                                                        float* d_lar,

                                                                        float* d_lp,

                                                                        float* d_lS,

                                                                        float3* d_lo,

                                                                        float3* d_lto,

                                                                        float* d_rar,

                                                                        float* d_rp,

                                                                        float* d_rS,

                                                                        float3* d_ro

                                                                        )

{

}

pyrtsa · December 10, 2007, 10:04pm

Hi, my guess would be exactly the same. I have faced problems with functions having long unmangled names in both NVCC 1.0 as 1.1. For me it seemed the backend stage of the compilation crashed when an unmangled identifier name became more thant ~256 characters long.

It took a number of template arguments and function parameters to get the function name this long but in my opinion the arbitrary length limit of the identifier name could well be ten times larger.

PS. I have been using a 32-bit system.

/Pyry

tbrandvik · March 11, 2008, 12:32pm

Hi,

Does anyone know what the status on this bug is? I’m experiencing exactly the same thing with the kernel below. Removing a single argument (seems it doesn’t matter which one) or shortening them to three letters makes it compile. I’m running CUDA 1.1 on 64-bit CentOS 5.

Cheers,

Tobias.

__global__ void inbcond_kernel(

int sb_npi, 

int sb_npcbi,

int *sb_patch_index,

float rfin,

float cp,

float cv,

float ga1,

float rgas,

float *sb_pv_t0,

float *sb_pv_p0,

float *sb_pv_angle,

float *sb_bp_ro,

float *buffer_ro,

float *buffer_ext_ro,

float *sb_bp_rovx,

float *buffer_rovx,

float *buffer_ext_rovx,

float *sb_bp_rovr,

float *buffer_rovr,

float *buffer_ext_rovr,

float *sb_bp_rorvt,

float *buffer_rorvt,

float *buffer_ext_rorvt,

float *sb_bp_roe,

float *buffer_roe,

float *buffer_ext_roe,

float *sb_bp_r,

float *buffer_r,

float *buffer_ext_r)

{   

}

nvcc error : ‘ptxas’ died due to signal 11 (Invalid memory reference)

m11 · March 17, 2008, 10:01am

no errors seen on winXP and cuda1.1 and MS VS2005

nvcc.exe -ccbin “C:\Program Files\Microsoft Visual Studio 8\VC\bin” -c -DWIN32 -D_DEBUG -D_CONSOLE -Xcompiler “/EHsc /W3 /nologo /Wp64 /Od /Zi /MDd /GR” -I"C:\Program Files\NVIDIA Corporation\NVIDIA SDK 10\NVIDIA CUDA SDK\common\inc" -I"C:\CUDA\include" --opencc-options -LIST:source=on -keep tst.cu

tst.obj created

I will try on fedora 8 i386_32 later …

m11 · March 17, 2008, 8:35pm

works fine with::

[root@gk-216c900ee6fc cudatst]# nvcc --version

nvcc: NVIDIA ® Cuda compiler driver

Built on Fri_Nov_30_07:11:42_PST_2007

Cuda compilation tools, release 1.1, V0.2.1221

[root@gk-216c900ee6fc cudatst]# uname -a

Linux gk-216c900ee6fc.lan 2.6.24.3-12.fc8 #1 SMP Tue Feb 26 14:58:29 EST 2008 i686 athlon i386 GNU/Linux

[root@gk-216c900ee6fc cudatst]# gcc --version

gcc (GCC) 4.3.0 20080307 (Red Hat 4.3.0-2)

This is free software; see the source for copying conditions. There is NO

warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

nvcc -c -I"/usr/local/cuda/include" -I"/root/NVIDIA_CUDA_SDK/common/inc" --opencc-options -LIST:source=on -keep tst.cu

Kravell · May 15, 2008, 12:46pm

I have the same bug, code compiles fine on 32bit system but “nvcc error : ‘ptxas’ died due to signal 11 (Invalid memory reference)” on 64bit system.
Is this fixed in Cuda 2.0 ? I can’t test it as my distrib is fedora 8, not yet supported.
I know it can be fixed by shortening variable names, but with a big code it’s a laborious task.

I have another question : I have another code which compiles fine but crash during execution on 64 bit, whereas it runs fine in 32 bit system; could this be a related bug ?

netllama · May 15, 2008, 2:23pm

@Kravell
The only way that someone can tell you whether its fixed for CUDA_2.0 is if you provide a test app which reproduces the problem.

Flolo · May 16, 2008, 5:39am

A testcase which reproduces this bug, is the CUDPP 1.0a library.

I had the same trouble: Internal Identifiers get to long, ptx crashed with the Signal 11 message. Making a global sed s/SegmentedScanTraits/SST helped fixing it. But because I have also just 1.1 I can say nothing if it is still in CUDA 2.0 present

netllama · May 16, 2008, 1:57pm

If you’d like further assistance, please provide detailed instructions on how to reproduce this problem.

Kravell · May 19, 2008, 2:10pm

Well, as Flolo just said you can easily reproduce the problem with the CUDPP 1.0a library which fails to compile with the same error (“nvcc error : ‘ptxas’ died due to signal 11 (Invalid memory reference)”) in a 64bit system with CUDA1.1.

Yet I assume this is fixed with CUDA2.0 as they say in the CUDPP documentation that CUDPP has been thoroughly tested on Redhat Enterprise Linux 5 (RHEL 5 x86_64, 64-bit) (CUDA 2.0).

Thanks anyway for your proposed help.

Topic		Replies	Views
CUDA compile trouble CUDA Programming and Performance	47	5100	November 8, 2010
CUDA 2.1 discussion CUDA Programming and Performance	71	63937	February 17, 2009
nvcc Segfault CUDA Programming and Performance	6	11408	October 14, 2010
CUDA V7.0 Release Mode Compile error: nvcc error : 'ptxas' died with status 0xC0000005 (ACCESS_VIOL CUDA Programming and Performance	7	3429	March 29, 2017
Yet another CUDA 3.1 confusion Can't compile examples under CUDA 3.1 cannot find -lrendercheckgl CUDA Programming and Performance	13	5739	February 13, 2015
nvcc error : 'ptxas' died due to signal 11 (Invalid memory reference) CUDA Programming and Performance	8	4699	March 12, 2014
first install of cuda CUDA Setup and Installation	6	7617	February 12, 2017
Simple CUDA Wizard for Visual Studio 2005 CUDA Programming and Performance	100	174508	April 8, 2012
Simple surface compilation problem on CUDA CUDA Programming and Performance	6	59583	May 25, 2012
CUDA Toolkit 3.0 released CUDA Programming and Performance	62	26015	September 21, 2010

CUDA 1.1 Bug - Compiler crash (ptxas) w/repro

Related topics