Dividing the size of the .cubin file by 3 should give a rough estimate. I doubt you can even create kernels with 2Mb of instructions with the current toolchain though, I ran out of virtual registers way before that when I tried.
Can you tell me where can I find the .cubin file? Is there any setting for generating it or it gets generated as default?
I’m asking this because I’m running the code in Debug mode on WindowsXp currently and haven’t found any .cubin files as generated file.
And how did you come to know that you’ve run out of virtual regisdters?
Now,
I’ll mention the build property for that .cu file as well.
“filter_cuda.cu” is the file from which the kernel is being called. I’ve included “filter_kernel.cu” in that .cu file. So, “filter_kernel.cu” is actually excluded from the direct built.
Just for information, let me mention that my end application is going to run on Linux. Just for simplicity in debugging, I’m using windows as development platform currently.
To Tanmay Anjaria:
Maybe you should read the documents and/or search the forum more carefully.
1)'s answer is in NVCC_1.0.pdf: -cubin
2)'s answer is right in wumpus’s signature: decuda
Also, ptxas’s built-in limit seems to be somewhat near 32767. If you exceed that, you program surely won’t compile.