CUDA kernel size What if it exceeds 2MB

Hello,

I’m using 8800 GTX and running a kernel for finding a filter response.

Ref:
[url=“http://forums.nvidia.com/index.php?showtopic=36286”]http://forums.nvidia.com/index.php?showtopic=36286[/url]

" Q : What is the maximum length of a CUDA kernel?
A : The maximum kernel size is 2MB of native instructions."

I doubt that my kernel has exited this limit and want to find out if that actually is the case.

Can anyone help me by telling HOW to find the exact size of the kernel native instructions?

Thanks in advance…

Dividing the size of the .cubin file by 3 should give a rough estimate. I doubt you can even create kernels with 2Mb of instructions with the current toolchain though, I ran out of virtual registers way before that when I tried.

Thanks for such a valueable input…

Can you tell me where can I find the .cubin file? Is there any setting for generating it or it gets generated as default?

I’m asking this because I’m running the code in Debug mode on WindowsXp currently and haven’t found any .cubin files as generated file.

And how did you come to know that you’ve run out of virtual regisdters?

Now,

I’ll mention the build property for that .cu file as well.

“filter_cuda.cu” is the file from which the kernel is being called. I’ve included “filter_kernel.cu” in that .cu file. So, “filter_kernel.cu” is actually excluded from the direct built.

build property for filter_cuda.cu is –

$(CUDA_BIN_PATH)\nvcc.exe -ccbin “$(VCInstallDir)bin” -c -D_DEBUG -DWIN32 -D_CONSOLE -D_MBCS -Xcompiler /EHsc,/W3,/nologo,/Wp64,/Od,/Zi,/RTC1,/MTd -I"$(CUDA_INC_PATH)" -I./ -I…/…/common/inc -o $(ConfigurationName)\filter_cuda.obj …/src/filter_cuda.cu

Just for information, let me mention that my end application is going to run on Linux. Just for simplicity in debugging, I’m using windows as development platform currently.

Thanks,

Tanmay

To Tanmay Anjaria:
Maybe you should read the documents and/or search the forum more carefully.
1)'s answer is in NVCC_1.0.pdf: -cubin
2)'s answer is right in wumpus’s signature: decuda
Also, ptxas’s built-in limit seems to be somewhat near 32767. If you exceed that, you program surely won’t compile.

well, may be u r right… I’m a newbie for this card (any graphics card for that matter…)

I’ve only gone through “NVIDIA_CUDA_Programming_Guide_1.0.pdf” and couple of other ppt files…

Thanks for the inputs though… will go through them…