CUDA does runtime compilation by driver, then what is cubin ?

I heard that CUDA does runtime compilation by driver, then what is cubin ?
If the driver simply downloads the cubin binary at runtime, then isn’t it just dynamically loading binary ?

The CUDA driver can either load a binary image compiled offline for a particular GPU architecture version, or it can compile a text-based portable assembly language called PTX at runtime. PTX is mostly useful for forward compatibility (for GPUs with new architectures that did not exist at the time the code was compiled) and for on-the-fly code generation. Fat binaries that include multiple, offline compiled, binary images for various architecture versions plus PTX for forward compatibility are supported and encouraged. See the CUDA documentation for details.

Thanks. So indeed, that fat binary is indeed very fat.

I didn’t know that it also included PTX info.

So only time that cubin image gets dynamically compiled is when the GPU is newer version ? (since the binary will be very old, so it looks at the PTX?)

Also, assuming the machine version and binary match, then by looking at the cubin binary, we can know the actual machine instructions ?

Thanks

No - the driver looks at the available binaries and if none matches, uses the PTX code. The date of compilation is not involved in the decision.

Yes. Just run [font=“Courier New”]cuobjdump -sass[/font] on it.

How fat the fat binaries get depends on what the programmer specifies when they compile the code. One can specify as many different versions of binary machine code and PTX as desired. Since PTX is a text-based and not a binary format it tends to add significant volume to a fat binary, so to keep the size under control it usually is advisable to include only a single PTX version (the most recent version, for forward compatibility). So for a double precision application one might want to include SASS for compute capabilities 1.3, 2.0, and 3.0, plus PTX for compute capability 3.0.

When loading a fat binary, the CUDA driver first looks for a pre-compiled binary that fits the architecture of the given GPU registered with the current context. If I cannot find that, it looks for PTX code it can JIT compile for that architecture. If that fails, it returns an error. My memory is hazy, but I believe there is an option to always force JITing from the PTX. Again, for questions like these the documentation is your friend.