How to link in modified .ptx code?

I want to attempt some hand optimization of my algorithms, but I haven’t found an easy way to insert modified .ptx code into my program.

The process would be:

  • run nvcc --ptx [program.cu] to get the .ptx source code
  • modify the .ptx file
  • continue the compiling process, using the modified .ptx instead of the kernel in the .cu file

It’s the last step I’m unsure of. Does anyone know how to do this?

Run nvcc using the ‘-v’ (verbose) option. You will see all the intermediate steps. You basically need to do ptxas -> fatbin -> cudafe++ -> gcc -E (preprocessing) -> gcc -c (compiling), and finally link them again.