I am auto generating some PTX code, which I plan to load using the cuModuleLoadDataEx method.
I am wondering: what kind of optimizations are done by the PTX → cubin compiler/assemler?
Where do I find information about this?
My concern right now is: Do I need to care about my register usage or does the assembler optimize that anyway?
BR Troels