Embedding PTX in executables for execution

,

Hi, I’m aware that this isn’t specifically a CUDA topic, but I don’t know where else I should ask, and since I’m interested in Nvidia GPU’s/drivers for this, I figured I’d ask here.

Say I have some PTX code, ie. this (note: am I to understand that the section containing .section is supposed to be a more readable example of the byte data representing the actual PTX as seen above?).

How would I embed this code in an executable (let’s just focus on x64 Windows here)? Specifically, I’m interested in which steps I’d need to take to insert the needed segments of data representing the PTX files, and how I’d go about invoking the contained kernels from withing the executable.

I’m aware that this topic is quite complicated, hence why I’d also really appreciate any pointers to useful resources on this topic.

This may be of interest. In general, the driver API provided by NVIDIA and its general usage doesn’t provide a toolchain ready-to-go that is designed or intended to embed PTX in an executable. It’s not a typical use case as far as I know. However the linked answer shows some possibilities. I’m aware it doesn’t answer all your questions.

Apologies for the late reply, I didn’t have time since it was finals week at my uni. As for the question; Thanks, your link looks incredibly useful. One thing I’d be really interested in, though, is the overall legality of such project. As far as I know, NVCC is a proprietary tool, and attempting to “recreate” a part of it (be it the assembler or code generator backend) could potentially cause issues. On the other hand, I’m aware of tools like LLVM or GCC, which are also capable of emitting PTX assembly.

Thanks in advance.

Here is another possible method. Also there is the ptxjit CUDA sample code. And here.