To what extent should the size of the object code produced after compiling CUDA kernels be considered (ignoring the compilation time for the moment)?
I could have a C++ template that produces 10,000 unqiue device functions at compile time that might save a small amount of work for example.
I’m not doing anything quite that silly, but must the binary fit into a GPU memory or cache of a certain size or am I going to start eating away at the memory available on a GPU if I really push this?
To what extent should the size of the object code produced after compiling CUDA kernels be considered (ignoring the compilation time for the moment)?
I could have a C++ template that produces 10,000 unqiue device functions at compile time that might save a small amount of work for example.
I’m not doing anything quite that silly, but must the binary fit into a GPU memory or cache of a certain size or am I going to start eating away at the memory available on a GPU if I really push this?
The size of the kernel is limited by the number of GPU instructions - 2 million instructions is a limit. However, it is still not clear how to count those instructions :-) I have asked NVIDIA people three times without any meaningful answers. In summary, the following thoughts might be reasonable:
Size of the kernel is limited by the number of GPU instructions - 2 million is a max.
Minimal size of a single GPU instruction is 32 bytes, maximal size is 64 bytes.
It is possible to check the size of the .cubin binary file when compiling the kernel with -keep option.
The size of the .cubin file is, as I think, close to the size of the actual binary code that represents the compiled kernel (if compiled for a single architecture).
According to (0) and (1) the maximal size of the .cubin file must be somewhere between 8 and 16 megabytes.
It is possible to estimate how far your kernel is from the 2 million limit by the size of the .cubin file.
The size of the kernel is limited by the number of GPU instructions - 2 million instructions is a limit. However, it is still not clear how to count those instructions :-) I have asked NVIDIA people three times without any meaningful answers. In summary, the following thoughts might be reasonable:
Size of the kernel is limited by the number of GPU instructions - 2 million is a max.
Minimal size of a single GPU instruction is 32 bytes, maximal size is 64 bytes.
It is possible to check the size of the .cubin binary file when compiling the kernel with -keep option.
The size of the .cubin file is, as I think, close to the size of the actual binary code that represents the compiled kernel (if compiled for a single architecture).
According to (0) and (1) the maximal size of the .cubin file must be somewhere between 8 and 16 megabytes.
It is possible to estimate how far your kernel is from the 2 million limit by the size of the .cubin file.