Setting cache control, when compiling into PTX with NVCC Xptxas -dlcm doesn't work

Hello,

I want to compile my .cu kernel into PTX code with Visual Studio & Cuda SDK 4.0.

It generates the following commandline for compiling:

My problem is that -Xptxas -dlcm doesn’t work, because there is no difference between PTX output. But it is good (compiled properly by driver), if I change all of “ld.global” instructions to e.g. “ld.cg.global” manually.

Is there any solution for this problem?

Thank you!

As the name of the switch implies, -Xptxas -dlcm is a component-level switch for PTXAS, which is the compiler backend that translates PTX into machine code. I am not aware of a top-level (nvcc) compiler switch that lets one control the cache mode for load instructions emitted into PTX. I would suggest looking into PTX inline assembly.

Finally, I’ve made a Powershell script to replace the instructions.

Any news on this issue? Look like I have the same problem. Should this options generate different ptx output?

No, because (as Norbert explained) [font=“Courier New”]-Xptxas[/font] marks an option to the PTX “assembler”, which isn’t invoked at all. The only useful thing Nvidia could do is to add a new option to the device code compiler ([font=“Courier New”]-Xopencc …[/font]) to issue [font=“Courier New”]ld.cg.global[/font] instead of [font=“Courier New”]ld.global[/font].