I got problem with my second Cuda project. The first program runs and produces rezults.
Firts I hit the nvcc “ran out of registers” bug/feature. Attempting to walkaround
it, I did minor changes to the code: made a for loop out of
The new code compiles successfully, runs, but never terminates. When run under
X Windows there are no error messages. However, outside X Windows it says
NVRM: Xid (0084:00):13 001 00000000 000050C0 00000368 00000000 00000080
This happens all the time.
I am 99% sure this is a bug in toolkit, not in my program. The aim of my project is
to estimate whether Cuda toolkit+videocard is suitable for a certain purpose, and the current status is “cuda cannot do it due to a bug”
I’m on 64-bit Linux and use nvidia drivers 169.09 and toolkit ver 1.1 if this
I enclose full source code and Makefile. Built with Makefile are 3 executables:
- compilin, terminatin within 0.2 seconds
- not compilin with
- compilin, not terminatin
The 3 executables are built from single source with different preprocessor
directives. The 1st executable is a trimmered version of the 2nd. The 3rd
differs from the 2nd only in the for loop mentioned above
My questions are:
- How do I solve the “ran out of registers in integer64” problem without changing
C source code?
- How do I change the code of executable 3 to make it compile and work