kernel runs fine under CUDA 1.0, fails under 1.1

glenvidia · December 7, 2007, 7:50pm

System info:

Ubuntu Feisty
CUDA 1.0 and 1.1
SDK is not applicable
gcc-4.1
Dual-dual Opteron
8 GB RAM
GeForce 8800 Ultra

The title of this topic (that a program that works under 1.0 doesn’t under 1.1) seems to me to be a very easy to describe symptom of a problem that I believe exists under 1.0 as well, under certain circumstances.

I’m not sure of the best way to exhibit the problem without disclosing all my source code, so I’ll describe the symptoms generally and then hopefully I can get some feedback on where to look or what further info to provide.

The symptom I observe first happened with I unrolled my innermost for-loop; the kernel then ran a lot faster (5x!) but then the output was garbage. That innermost loop was repeating the same function call 7 times, so the unrolling was trivial. When I comment out 5 of those calls, the whole kernel seems to at least be executing, but with 4 or fewer commented out, it doesn’t.

I replaced the function call with a macro and ensured that it wasn’t using any additional memory on each instantiation, but that didn’t help.

I tried using CUDA 1.1, but with that version, even the non-unrolled (rolled) for-loop version doesn’t run at all.

The emulator version works in all cases …

The only thing unusual that I [think I] am doing is using a lot of shared memory on all of the MPs: practically all 16kB.

Any input would be very helpful.

Thank you,
Glen Mabey

glenvidia · December 21, 2007, 10:14pm

Just today I realized that CU_SAFE_CALL and friends don’t do anything unless you have def’ed _DEBUG … ouch.

Topic		Replies	Views
any backward compatibility issue for CUDA 1.1? CUDA Programming and Performance	13	9841	December 21, 2007
Random behaviour with TESLA C870 CUDA Programming and Performance	11	6590	May 29, 2008
Kernel slow in 2.0 CUDA Programming and Performance	2	2026	December 4, 2008
register use growth when switche CUDA1.0->CUDA1.1 CUDA Programming and Performance	5	4095	December 11, 2007
Using Cuda1.1 for GTX280, anything bad will happen ? CUDA Programming and Performance	2	1166	November 4, 2009
CUDA_SAFE_CALL CUDA Programming and Performance	1	8017	November 8, 2010
Runtime API CUDA Programming and Performance	8	10319	May 21, 2009
Arithmetic bug in CUDA 2.1 most probably it is a bug related to the optimization of the code CUDA Programming and Performance	3	3576	February 11, 2009
Same Kernel different machine CUDA Programming and Performance	3	962	June 5, 2013
array index CUDA Programming and Performance	0	508	February 25, 2012

kernel runs fine under CUDA 1.0, fails under 1.1

Related topics