Demistifying CUDA fat binaries

I’d like to have a clearer understanding of the CUDA fat binaries. I understand their purpose put i’m looking for something else: i want to understand better what’s inside, how this fatbin is handled at runtime and everything else that could be worth. Also web resources would be nice.

A good starting point seems to compile cuda code with nvcc and keep for getting the fatbin and the fatbin.c. the fatbin.c file shows the symbol that will be put in the executables: fatbinData and it will be mapped with the section .nv_fatbin .

By looking at the include files of the fatbin.c, I got the idea that i could somehow control and see something by using functions inside the headers fatBinaryCtl.h fatbinary.h .

I’ve already checked the following resources:

nvidia blog:

nvcc guide:

stackoverflow answer:

From my knowledge and from what i saw inside the code, when a kernel must be executed, the system let the driver load the cuda binary contained inside the fatbinary (the fatbinary is contained by the regular executable) inside the GPU memory. It seems that this binary-inside-the-binary is still an ELF executable (see the stackoverflow answer and i verified this).

Any clue about all of this?

So what specifically it it you want to know?

To my knowledge, NVIDIA has not published the details of how they package architecture-specific SASS and PTX into an ELF object file to create the fat binary. Since various systems using fat binaries (usually covering only two architectures though) preceded CUDA by many years, I would assume that ELF already provides (standard) means for such encapsulation.

At kernel load time, the CUDA runtime looks for SASS that matches the GPU architecture. If it cannot find any, it looks for PTX it can JIT compile. If neither are found, it throws an error. The SASS from JIT-compiled PTX kernels is stored in files in the JIT cache. By visual inspection each such file seems to consist of a header of some sort followed by an ELF object file.

What more is it you need to know? What use case are you trying to address?

Yeah, there’s a fatELF format, but it’s not used a lot (and also in this case, just a check wit the file command on an executable made with nvcc shows this fact).

From my research, it seems that the fatbin is stored inside the .nv_fatbin section. This fatbin seems to contain the cubins. I think that the driver fetch such a section and chooses the best suitable cubin that can be executed by the available GPU.

If everything I wrote is correct, how is the cubin handled? Is the cubin copied inside the GPU memory or is it used in a different way?