Size of CUDA Object Code?

Hi,

To what extent should the size of the object code produced after compiling CUDA kernels be considered (ignoring the compilation time for the moment)?

I could have a C++ template that produces 10,000 unqiue device functions at compile time that might save a small amount of work for example.

I’m not doing anything quite that silly, but must the binary fit into a GPU memory or cache of a certain size or am I going to start eating away at the memory available on a GPU if I really push this?

Thanks

Hi,

To what extent should the size of the object code produced after compiling CUDA kernels be considered (ignoring the compilation time for the moment)?

I could have a C++ template that produces 10,000 unqiue device functions at compile time that might save a small amount of work for example.

I’m not doing anything quite that silly, but must the binary fit into a GPU memory or cache of a certain size or am I going to start eating away at the memory available on a GPU if I really push this?

Thanks

The size of the kernel is limited by the number of GPU instructions - 2 million instructions is a limit. However, it is still not clear how to count those instructions :-) I have asked NVIDIA people three times without any meaningful answers. In summary, the following thoughts might be reasonable:

  1. Size of the kernel is limited by the number of GPU instructions - 2 million is a max.

  2. Minimal size of a single GPU instruction is 32 bytes, maximal size is 64 bytes.

  3. It is possible to check the size of the .cubin binary file when compiling the kernel with -keep option.

  4. The size of the .cubin file is, as I think, close to the size of the actual binary code that represents the compiled kernel (if compiled for a single architecture).

  5. According to (0) and (1) the maximal size of the .cubin file must be somewhere between 8 and 16 megabytes.

  6. It is possible to estimate how far your kernel is from the 2 million limit by the size of the .cubin file.

Hope this helps.

The size of the kernel is limited by the number of GPU instructions - 2 million instructions is a limit. However, it is still not clear how to count those instructions :-) I have asked NVIDIA people three times without any meaningful answers. In summary, the following thoughts might be reasonable:

  1. Size of the kernel is limited by the number of GPU instructions - 2 million is a max.

  2. Minimal size of a single GPU instruction is 32 bytes, maximal size is 64 bytes.

  3. It is possible to check the size of the .cubin binary file when compiling the kernel with -keep option.

  4. The size of the .cubin file is, as I think, close to the size of the actual binary code that represents the compiled kernel (if compiled for a single architecture).

  5. According to (0) and (1) the maximal size of the .cubin file must be somewhere between 8 and 16 megabytes.

  6. It is possible to estimate how far your kernel is from the 2 million limit by the size of the .cubin file.

Hope this helps.

Also, this may be a per-kernel limit rather than a per-application limit. So if each of those functions is a separate kernel it might not matter.

Also, this may be a per-kernel limit rather than a per-application limit. So if each of those functions is a separate kernel it might not matter.