Error: ran out of registers

Does the following error means I have ran out of registers for executing my kernel?
Is there any way to solve this?? I used -g -maxrregcount and it didn’t have any effects …


Assertion failure at line 2433 of …/…/be/cg/NVISA/cgtarget.cxx:

Compiler Error in file Test.cpp3.i during Register Allocation phase:

ran out of registers in float

nvopencc INTERNAL ERROR: /usr/local/cuda/open64/lib//be returned non-zero status 1

-g is not useful, could even be harmful.

-maxregcount needs a number after it.

I did used a number after -maxregcount , but still has no effect on the compilation.

Any thoughts?

which number?

Otherwise, you probably have a very large kernel?

according to my calculations, maximal allowed number of registers as defined

by nvopencc is 23999 / 64 + 1 = 375

So I guess if you exceed the limit this error should occur, how many registers you have specified for your kernel ?

Are you sure? Because of single-asignment, registres get used up very quickly. You might only need a few hundred lines of code to use that many. I have kernels with several thousand lines.

maxrregcount affects ptxas. this is an error in nvcc/open64

you are right, looks like the things are a bit more subtle here,

registers are divided into classes, there are at least 6 types: integer16, integer, integer64, float, float64 and predicate

you can look through open64 sources, each used register within class corresponds to 1 bit in an array of 64-bit ints of size 375 which I mentioned

before, so in total it should be 375*64 = 24000 registers of one class

sorry for confusion :)

This is still a very interesting limit to keep in mind. So this is what gpugpu hit?

I alos get similar error:

1>### Assertion failure at line 2433 of …/…/be/cg/NVISA/cgtarget.cxx:
1>### Compiler Error in file XXXX\Temp/tmpxft_00000b98_00000000-9_cudaEntry.cpp3.i during Register Allocation phase:
1>### ran out of registers in float
1>nvopencc ERROR: F:\CUDA\bin/…/open64/lib//be.exe returned non-zero status 1

I admit my cu file is a bit long, are there any method to bypass it? (can device functions be called as external functions?)

Thanks in advance,


NVIDIA, have you something to say?

It looks like the advertised “2 million PTX instructions” can never be hit because of opencc’s puzzling limitation. Is there no simple way to just increase the max?