Is there a way to find out the number of registers in a kernel program?

(env: cuda sdk 1.0 on ubuntu)

In cuda, a set of registers are shared by all threads within a block. There’s a limit on the available number of registers. so the more registers a kernel function uses, the smaller execution configuration it needs to have.

however, the nvcc will compile codes that are over registers limit just as usual and when you run it, cuda will fail silently without any error messages.

Is there a way to find out just how many registers the compiled kernel functions use such that the exe config can be adjusted according?

for example, let’s say the register limit is 1000 for the sake of discussion and I have a function

__global__ use100registers();

then i can at most use 100 threads per grid:

dim3 grid(1); dim3 threads(100);


but how do i know how many registers does a function uses? for my project i use trial and error method if find out the max thread number, is there a easier way to find that out?

nvcc -cubin

It will generate a cubin file, that contains the info you are looking for

cat double.cubin 

architecture {sm_10}

abiversion {0}

code  {

        name = _Z5testDPd

        lmem = 0

        smem = 20

        reg = 2

        bar = 0

        bincode  {

                0x1000c805 0x0423c780 0x10008001 0x03f80003 




excellent! thanks