I want to compile my .cu kernel into PTX code with Visual Studio & Cuda SDK 4.0.
It generates the following commandline for compiling:
My problem is that -Xptxas -dlcm doesn’t work, because there is no difference between PTX output. But it is good (compiled properly by driver), if I change all of “ld.global” instructions to e.g. “ld.cg.global” manually.
As the name of the switch implies, -Xptxas -dlcm is a component-level switch for PTXAS, which is the compiler backend that translates PTX into machine code. I am not aware of a top-level (nvcc) compiler switch that lets one control the cache mode for load instructions emitted into PTX. I would suggest looking into PTX inline assembly.
No, because (as Norbert explained) [font=“Courier New”]-Xptxas[/font] marks an option to the PTX “assembler”, which isn’t invoked at all. The only useful thing Nvidia could do is to add a new option to the device code compiler ([font=“Courier New”]-Xopencc …[/font]) to issue [font=“Courier New”]ld.cg.global[/font] instead of [font=“Courier New”]ld.global[/font].