I want to attempt some hand optimization of my algorithms, but I haven’t found an easy way to insert modified .ptx code into my program.
The process would be:
- run nvcc --ptx [program.cu] to get the .ptx source code
- modify the .ptx file
- continue the compiling process, using the modified .ptx instead of the kernel in the .cu file
It’s the last step I’m unsure of. Does anyone know how to do this?