I’m looking into the process involved in manually compiling to .ptx files and then generating a binary from this. I’ve looked at the ‘Full CUDA Compilation trajectory’ in the nvcc guide and I’m still a confused.
I’m familiar with the process of using nvcc to compile straight to object files ready for linking into an application using gcc (I’m on x64_64 linux):
nvcc32 -o main.cu.o main.cu -arch sm_21 gcc412 -o main.cpp.o main.cpp gcc412 -o main main.cu.o main.cpp.o
How would I do this process manually going to ptx first?
nvcc32 -o main.ptx main.cu -ptx -arch sm_21 ptxas -o main.cubin main.ptx -cubin
As far as I understand it, I now have to generate the code in fatbin format as this what I link into the binary with the rest of the gcc object files? Also, I’m a little confused about the filehashing and generating the key values, I can’t seem to correctly pass this .cubin file into the fatbin app without this? Listing the nvcc steps seems to produce a lot of commands and I’m not sure what they all means, or more importantly which ones I can leave out if I’m starting with ptx rather than cu with mixed host / device code.
I’m aware that there are a few options to load the ptx at runtime, but I’d really prefer to compile this into a complete binary as in the first example if possible.