The question is in the subject …
My kernel grows extensively, would be nice to know how far it’s size is from the limit.
How to find out how many ptx instructions are in the kernel ? Keeping in mind the 2 million ptx inst
nvcc -ccbin -ptx
will generate the ptx file. Im not sure if the limit if before or after optimization (which is not present in the ptx)
How to interpret it ?
Each line (like “mul.lo.s32 %r1578, %r19, %r1577;”) is a single instruction ?
Also, ptx is not optimized so the final result can differ significantly …
It is possible to generate .cubin and check the bincode { … } section out, it contains a list of 32bit integers. Are these integers actual instructions ? And if so, how many bits (32 or 64) each instruction contains ?
Lots of questions, heh …
I don’t think the 2 million instruction limit is a PTX instruction limit…
Well, how to estimate how much is too much ? :-)
My cubin file (bincode section) contains 792 lines of fours like this: 0x307ccbfd 0x6c20c7c8 0x30000003 0x00000280
Each line is an instruction ? Or each 32bit hex is an instruction ?
The visual profiler counts the instructions executed by a kernel. Maybe you could use this as a hint…
Nope, a
for(i = 0; i < 1000000; ++i)
a++;
will be counted as a million instructions in the profiler (actually, closer to 4 million probably) yet it’s about four or five PTX instructions. The limit is for code length, not # of executed instructions.
Yeah, performed instructions are not what I’m trying to find out …
So, the only way is an examination of .cubin ?
Most instructions are 64-bit wide. So your program contains at least 1584 instructions.
I suspect the 2M-instruction limit is actually a 16MB-cubin limit.
Anyway the compiler will probably die well before reaching the million-instruction range… Kernels with ~100.000 instructions already take hours to compile.
The distance between 1500 instructions and 100.000 instructions is not too big … should I get prepared to the exponential growth of kernel compilation time ?
Geez, what are you coding anyway? :)
In two words - designing a problem solver based on genetic programming, one particular problem may require significant amount of code.
I don’t think I’ll get to 100.000 instructions, however, 1500 is definitely not a limit, would like to know more about big kernels behaviour.