In general, the execution file is compiled from cu source code, and ptx file is convenient for us to analysis. But can we compile the ptx file to a binary execution file? Howto?
And can we manually write a new ptx code and then compile it to binary file?
If you want to go the nvidia supported route, you can use the driver level api to manually load in PTX files. See around page 133 of the CudaReferenceManual, specifically the cuModuleLoadDataEx function. This lets you set the optimization level, target architecture, etc. You won’t be able to use the CUDA runtime API though.
Alternatively, you can use the ptxas tool, which is the PTX assembler. It will perform the same task on a static PTX file; it will convert a .ptx file into a .cubin file. Even if you go this route you will still need to use the CUDA driver API to load in the .cubin files.