I have not yet started to convert my serial code to CUDA version, but I am also curious about the maximum kernel size possible. Most of my subroutines are very long polynomials and I’m wondering about how large a kernel can get. It looks like trial and error is the only way to find out for now??
It is stated in Appendix A of the programming guide - 2 million PTX instructions per kernel.
Found it! I was searching for “instruction size” not “kernel size”. That’s huge, I should not have any problems! At least getting my kernels to fit - getting them to run correctly is another story…
Thanks avidday. Much appreciated.