Dividing the size of the .cubin file by 3 should give a rough estimate. I doubt you can even create kernels with 2Mb of instructions with the current toolchain though, I ran out of virtual registers way before that when I tried.
To Tanmay Anjaria:
Maybe you should read the documents and/or search the forum more carefully.
1)'s answer is in NVCC_1.0.pdf: -cubin
2)'s answer is right in wumpus’s signature: decuda
Also, ptxas’s built-in limit seems to be somewhat near 32767. If you exceed that, you program surely won’t compile.