If I want to hand-tune the .ptx output from NVCC, is it possible to compile my own .ptx file? If so, how?
ptxas is the assembler. I think that you might see it being called when adding the -v switch to the nvcc command line (but am not sure)
That would be great if you can. It seems that the compiler, at this stage, is not all the intelligent and it would be nice to “tweak” the code more.
Also, in the .ptx files, is the register naming generic? It looks to be linearly incremental and not a phsyical representation of the hardware, as it seems there is some abstraction.
nvcc appears to use SSA, hence the constant increasing registers as they are used. ptxas performs real register allocation. If you need that kind of fine grained control over the actual cubin, use decuda.