CUDA does runtime compilation by driver, then what is cubin ?

Michael_H1 · April 3, 2012, 7:57pm

I heard that CUDA does runtime compilation by driver, then what is cubin ?
If the driver simply downloads the cubin binary at runtime, then isn’t it just dynamically loading binary ?

njuffa · April 3, 2012, 8:57pm

The CUDA driver can either load a binary image compiled offline for a particular GPU architecture version, or it can compile a text-based portable assembly language called PTX at runtime. PTX is mostly useful for forward compatibility (for GPUs with new architectures that did not exist at the time the code was compiled) and for on-the-fly code generation. Fat binaries that include multiple, offline compiled, binary images for various architecture versions plus PTX for forward compatibility are supported and encouraged. See the CUDA documentation for details.

Michael_H1 · April 3, 2012, 9:03pm

Thanks. So indeed, that fat binary is indeed very fat.

I didn’t know that it also included PTX info.

So only time that cubin image gets dynamically compiled is when the GPU is newer version ? (since the binary will be very old, so it looks at the PTX?)

Also, assuming the machine version and binary match, then by looking at the cubin binary, we can know the actual machine instructions ?

Thanks

tera · April 3, 2012, 9:34pm

No - the driver looks at the available binaries and if none matches, uses the PTX code. The date of compilation is not involved in the decision.

Yes. Just run [font=“Courier New”]cuobjdump -sass[/font] on it.

njuffa · April 3, 2012, 10:36pm

How fat the fat binaries get depends on what the programmer specifies when they compile the code. One can specify as many different versions of binary machine code and PTX as desired. Since PTX is a text-based and not a binary format it tends to add significant volume to a fat binary, so to keep the size under control it usually is advisable to include only a single PTX version (the most recent version, for forward compatibility). So for a double precision application one might want to include SASS for compute capabilities 1.3, 2.0, and 3.0, plus PTX for compute capability 3.0.

When loading a fat binary, the CUDA driver first looks for a pre-compiled binary that fits the architecture of the given GPU registered with the current context. If I cannot find that, it looks for PTX code it can JIT compile for that architecture. If that fails, it returns an error. My memory is hazy, but I believe there is an option to always force JITing from the PTX. Again, for questions like these the documentation is your friend.

Topic		Replies	Views
Understanding PTX, the Assembly Language of CUDA GPU Computing Technical Blog	2	119	August 17, 2025
PTX in binary ? CUDA Programming and Performance	9	7949	June 20, 2011
CUDA Pro Tip: Understand Fat Binaries and JIT Caching Technical Blog	1	502	February 22, 2016
Runtime compiling+linking CUDA Programming and Performance	2	487	August 10, 2023
Driver API: PTX or CUBIN modules? CUDA Programming and Performance	3	2504	July 9, 2009
Demistifying CUDA fat binaries CUDA Programming and Performance	2	6230	September 28, 2018
Cubin vs. PTX CUDA Programming and Performance	1	9175	October 8, 2011
Can I compile the ptx or cubin file to a binary file? CUDA Programming and Performance	2	9886	February 3, 2010
Runtime Fatbin Creation Using the NVIDIA CUDA Toolkit 12.4 Compiler Technical Blog	2	154	June 18, 2024
The process for assembling .ptx files into .o files for linking into a binary? CUDA Programming and Performance	1	6100	April 4, 2011

CUDA does runtime compilation by driver, then what is cubin ?

Related topics