Can I compile the ptx or cubin file to a binary file?

In general, the execution file is compiled from cu source code, and ptx file is convenient for us to analysis. But can we compile the ptx file to a binary execution file? Howto?

And can we manually write a new ptx code and then compile it to binary file?

Is there any way to tune the ptx code automatically?

I’m doing the gpu code performance optimization and try to figure out some general and intelligent optimizer.

If you want to go the nvidia supported route, you can use the driver level api to manually load in PTX files. See around page 133 of the CudaReferenceManual, specifically the cuModuleLoadDataEx function. This lets you set the optimization level, target architecture, etc. You won’t be able to use the CUDA runtime API though.

Alternatively, you can use the ptxas tool, which is the PTX assembler. It will perform the same task on a static PTX file; it will convert a .ptx file into a .cubin file. Even if you go this route you will still need to use the CUDA driver API to load in the .cubin files.

If you want a greater amount of control over the optimizations that are applied, you can write an optimization pass using Ocelot Ocelot also has an interface for loading a kernel from inlined PTX or an input file and calling it directly using the CUDA runtime API. Going this route would require you to compile and link against Ocelot.