NVRM: Xid (0084:00) kernel does not terminate

I got problem with my second Cuda project. The first program runs and produces rezults.

Firts I hit the nvcc “ran out of registers” bug/feature. Attempting to walkaround
it, I did minor changes to the code: made a for loop out of



The new code compiles successfully, runs, but never terminates. When run under
X Windows there are no error messages. However, outside X Windows it says

NVRM: Xid (0084:00):13 001 00000000 000050C0 00000368 00000000 00000080

This happens all the time.

I am 99% sure this is a bug in toolkit, not in my program. The aim of my project is
to estimate whether Cuda toolkit+videocard is suitable for a certain purpose, and the current status is “cuda cannot do it due to a bug”

I’m on 64-bit Linux and use nvidia drivers 169.09 and toolkit ver 1.1 if this

I enclose full source code and Makefile. Built with Makefile are 3 executables:

  1. compilin, terminatin within 0.2 seconds
  2. not compilin with
  3. compilin, not terminatin

The 3 executables are built from single source with different preprocessor
directives. The 1st executable is a trimmered version of the 2nd. The 3rd
differs from the 2nd only in the for loop mentioned above

My questions are:

  1. How do I solve the “ran out of registers in integer64” problem without changing
    C source code?
  2. How do I change the code of executable 3 to make it compile and work

Have you tried to see if driver 169.21 (from memory) has the same problem? There is at least a newer driver out that supports CUDA.

@haRsh, please generate and attach an nvidia-bug-report.log

To netlama: thank you for quick reply. I will send nvidia-bug-report.log monday morning, if the problem remains under drivers 169.12

To DenisR: I failed to find 169.21 for Linux. Did you mean 169.12?

Hmm, at home now, so I cannot check. At least it is the last one that you get with FC8 updates :)

The problem stays the same under drivers ver. 169.12

I attach rezult of nvidia-bug-report.sh for ver. 169.09 and 169.12

Other than the fact that you’re using an unsupported Linux distribution, I don’t see anything unusual in the bug report. I tried to reproduce this with the code that you attached in a supported Linux distribution, however it failed to build and I found your build instructions unclear.

Please clarify the build command(s) required to build your test app, or update the Makefile so that it can be built by running ‘make’.


[quote name=‘netllama’ date=‘Apr 7 2008, 05:09 PM’]

… I found your build instructions unclear.

…Please clarify the build command(s) required to build your test app, or update the Makefile > …so that it can be built by running ‘make’.

Ups? Did You read file ReAd.It? Build process is described there. I will shortly repeat it here.

The build process involves 2 steps. First make executable script called CUDA and place it somewhere in the $PATH. Then go to directory containin Makefile and type make. This should attempt to build and execute 3 executable files. Run result will be redirected into different files for each executable.

Yes, I read ReAd.It. Your instructions on how to make an executable script called CUDA were unclear. If building this app requires more than just running ‘make’ using the Makefile you provided, then please provide any additional requisite script(s) or build commands.


To netllama:

Separate script (setting up some environment variables) is needed by Makefile because I don’t know where you installed Cuda toolkit. To run my code do the following:

  1. Go to /usr/local/bin

  2. Open empty file in your favorite text editor

  3. Keyboard 5 lines found between
    ==== /usr/local/bin/CUDA start ====


==== /usr/local/bin/CUDA end ====

inside file ReAd.It

  1. Go to 1st line of the file and replace < path to toolkit > with the directory where you installed Cuda toolkit. This directory should contain 5 subdirs bin, doc, include, lib, open64; and directory bin/ should contain file nvcc and some other

  2. Save file as CUDA

  3. Leave text editor.

  4. Type ls -l to check if the file CUDA is present in current directory /usr/local/bin

  5. Make the file executable by issuing command < chmod +x ./CUDA >.

  6. Now go to the directory containing ReAd.It and Makefile

  7. Type < CUDA nvcc --version >. This should run nvcc executable. You should see 4 lines of nvcc introduction.

  8. If you want to build all 3 executables yourself, type < make clean >. This will erase 2 executable files exe/*

  9. Type < make > then watch executables compile and/or run. You may want to open another window to view files


If the 1st executable did not build, this could be due to the fact that file common/inc/cutil.h was not located my Makefile. cutil.h is part of Cuda SDK.

Standard output and standard error of 1st executable will be inside rezult.1.0.cout and rezult.1.0.cerr respectively

Second executable won’t build

Results of 3rd executable will be rezult.2.1.cout and rezult.2.1.cerr

3rd executable won’t terminate, and it won’t load CPU after a fraction of second. Type < ps | grep cuda > or < ps | grep make > to check what’s going on

If you have further questions, fell free to ask

Since usage of external script /usr/local/bin/CUDA became a problem, I changed Makefile to automagically find toolkit, so the script is no longer needed.

The new Makefile is attached to this message.

The new version is shorter, more verbose and has correct dependancies. Prior to running executable it outputs a message

Copy the new Makefile over old. Type make

I managed to walkaround the ran-out-of-registers and kernel-loops-forever bugs by changing source code. The new code compiles, runs and terminates, slightly outperforming central processor (see my signature for details).

My code heavily and randomly accesses constant memory, hence GPU is only slightly faster than CPU now. I hope to speed up cuda code by moving constants to shared memory

Hence I should inform of the intermediate result of the project:

  • the Nvidia compiler is BUGGY,
  • but sometimes it is worth spending time programming for GPU

I’m afraid that the new Makefile still doesn’t work correctly:

$ make
make cuda_run
make[1]: Entering directory /root/NVIDIA_CUDA_SDK/projects/63908' /usr/local/cuda/bin/nvcc --maxrregcount 128 -DUNIX twofish.cu -o exe/twofish_cuda_1_0 \ -DBLOCKS=4 -DCYCLE=10 -I./automagically.generated -I/root/NVIDIA_CUDA_SDK/common/inc/cutil.h \ -L/usr/local/cudalib \ -DTIMES=1 -DENCRYPT_IN_CYCLE=0 \ -lcudart \ 2>compile.1.0.err make[1]: *** [exe/twofish_cuda_1_0] Error 255 make[1]: Leaving directory /root/NVIDIA_CUDA_SDK/projects/63908’
make: *** [bug0] Error 2

I believe constant memory is as fast as it gets because it is cached, as long as all threads are accessing the same indices (if it is an array), otherwise it is indeed smart to put into shared memory or even a texture might do the trick.

Buggy is a strong statement I think. It does not generate wrong code, it crashes in certain circumstances. I have also had it happen once, but to be honest my code was crappy (in hindsight) and the compiler does not trip over my cleaner code.

FWIW, I rather have it crashing than generating wrong code, I already have enough trouble debugging my bugs :P

[quote name=‘netllama’ date=‘Apr 9 2008, 07:41 PM’]

I’m afraid that the new Makefile still doesn’t work correctly:

$-I/root/NVIDIA_CUDA_SDK/common/inc/cutil.h \


The command-line is incorrect. Included should be directory, not file name

Probably the magic spell < locate common/inc/cutil.h | xargs dirname > did not cast properly on your computer

Will you please set CUTIL_PATH in 4th line of Makefile manually to /root/NVIDIA_CUDA_SDK/common/inc/

Do I need to write extended instruction for you how to do it?

Next time if you have compilation errors please attach the relevant compile.cerr*


I keep thinking about the possible reason of the build problem. I came to a conclusion that 1 of the following 3 statements is true:

  1. xargs command is present on netllama computer but malfuctions

  2. dirname command is present on nettlama computer but malfunctions

  3. netllama edited 4th line of my Makefile incorrectly, then complained me about a problem in my Makefile

Is my guess right?


I prefer to take a working cuda-unaware program and convert it to cuda code with perl/bash script. This ensures absence of bugs. And very often compiler-ran-out-of-registers bug stops me (setting Olimit appears to have no effect). I failed several times before creating a variant which compiles and fits in registers completely. And the code is not optimal: if the compiler worked properly I could make it better

Moving hard-coded tables from constant to shared device memory increased speed more than twice, so now my G84-based card is more than 3 times better than two-core Athlon for the project (which means that $250 videocard should be >12 times better). I change my signature accordingly.

I’m using approximately half of shared memory, so I will try to switch from 1-byte char to 4-byte int

Enlarging tables gave slight improvement. Now the program occupies 124 registers per thread and 11K shared memory