How to find out how many ptx instructions are in the kernel ? Keeping in mind the 2 million ptx inst

The question is in the subject …
My kernel grows extensively, would be nice to know how far it’s size is from the limit.

nvcc -ccbin -ptx
will generate the ptx file. Im not sure if the limit if before or after optimization (which is not present in the ptx)

How to interpret it ?

Each line (like “mul.lo.s32 %r1578, %r19, %r1577;”) is a single instruction ?

Also, ptx is not optimized so the final result can differ significantly …

It is possible to generate .cubin and check the bincode { … } section out, it contains a list of 32bit integers. Are these integers actual instructions ? And if so, how many bits (32 or 64) each instruction contains ?

Lots of questions, heh …

I don’t think the 2 million instruction limit is a PTX instruction limit…

Well, how to estimate how much is too much ? :-)

My cubin file (bincode section) contains 792 lines of fours like this: 0x307ccbfd 0x6c20c7c8 0x30000003 0x00000280

Each line is an instruction ? Or each 32bit hex is an instruction ?

The visual profiler counts the instructions executed by a kernel. Maybe you could use this as a hint…

Nope, a

for(i = 0; i < 1000000; ++i)

a++;

will be counted as a million instructions in the profiler (actually, closer to 4 million probably) yet it’s about four or five PTX instructions. The limit is for code length, not # of executed instructions.

Yeah, performed instructions are not what I’m trying to find out …

So, the only way is an examination of .cubin ?

Most instructions are 64-bit wide. So your program contains at least 1584 instructions.

I suspect the 2M-instruction limit is actually a 16MB-cubin limit.

Anyway the compiler will probably die well before reaching the million-instruction range… Kernels with ~100.000 instructions already take hours to compile.

The distance between 1500 instructions and 100.000 instructions is not too big … should I get prepared to the exponential growth of kernel compilation time ?

Geez, what are you coding anyway? :)

In two words - designing a problem solver based on genetic programming, one particular problem may require significant amount of code.

I don’t think I’ll get to 100.000 instructions, however, 1500 is definitely not a limit, would like to know more about big kernels behaviour.