How to prevent ptxas from reordering code

Nikratio · April 19, 2011, 6:57pm

The following .cu code

extern "C" __global__ void XXX(volatile int *device1,

 	volatile int *device2){

	clock_t t1, t2;

	unsigned int tmp;

	asm volatile ("mov.u32 %0, %%clock;\n\t"

 	"ld.global.u32 %2, [%3];\n\t"

 	"add.u32 %2, %2, 2;\n\t"

 	"mov.u32 %1, %%clock;\n\t"

 	: "=r"(t1), "=r"(t2), "=r"(tmp)

 	: "r"(device1)

 	: "memory");

	device1[42] = t2 - t1;

}

is translated to

nvcc -arch=compute_20 -code=sm_20 --ptxas-options=-v,-O0 --opencc-options=-O0 --cubin test.cu

cuobjdump --dump-sass test.cubin

	code for sm_20

		Function : XXX

	/*0000*/ 	/*0x00005de428004404*/ 	MOV R1, c [0x1] [0x100];

	/*0008*/ 	/*0x40001c042c000001*/ 	S2R R0, SR_ClockLo;

	/*0010*/ 	/*0x08001c035000c000*/ 	IMUL.U32.U32 R0, R0, 0x2;

	/*0018*/ 	/*0x40009c042c000001*/ 	S2R R2, SR_ClockLo;

	/*0020*/ 	/*0x08209c035000c000*/ 	IMUL.U32.U32 R2, R2, 0x2;

	/*0028*/ 	/*0x00201d0348000000*/ 	IADD R0, R2, -R0;

	/*0030*/ 	/*0x80009de218000000*/ 	MOV32I R2, 0x20;

	/*0038*/ 	/*0x00209c8614000000*/ 	LDC R2, c [0x0] [R2];

	/*0040*/ 	/*0xa0209c034800c002*/ 	IADD R2, R2, 0xa8;

	/*0048*/ 	/*0x00201f8590000000*/ 	ST.WT [R2], R0;

	/*0050*/ 	/*0x00001de780000000*/ 	EXIT;

So operations are reordered despite disabled optimization and volatile declaration. Instead of measuring the clock cyles between loading from device memory, the program measures the time required for a multiplication.

How can I convince ptxas to not reorder this code, but to translate it as-is, line-by-line?

nvcc --version

nvcc: NVIDIA Â® Cuda compiler driver

Built on Sun_Mar_20_16:47:09_PDT_2011

Cuda compilation tools, release 4.0, V0.2.1221

Topic		Replies	Views
Stopping optimization of inline PTX code CUDA Programming and Performance	2	640	February 6, 2019
Array Assigned to Register and Unrolled Loop Variable Bug CUDA Programming and Performance	6	1864	May 11, 2009
Understanding Generated PTX code CUDA Programming and Performance	1	7232	March 8, 2011
preventing ptxas from reordering instructions CUDA Programming and Performance	23	6154	December 2, 2022
Clock Cycles of CUDA kernel How to determine the clock cycles...? CUDA Programming and Performance	9	14813	June 27, 2008
CUDA clock() issue CUDA Programming and Performance	12	2393	March 7, 2017
Switch oddities Compiler bug? CUDA Programming and Performance	16	4571	September 10, 2008
Initializing cuda global variable CUDA Programming and Performance	3	4517	December 25, 2017
Identifier "__HALF2_TO_UI" is undefined when using asm for cuda CUDA Programming and Performance cuda	18	552	October 27, 2023
Arguments mismatch for instruction 'mma', why? CUDA Programming and Performance	7	556	November 13, 2023

How to prevent ptxas from reordering code

Related topics