How to prevent cvt ptx-instructions?

Hi Everybody,

I’m trying to find a way to prevent NVCC from placing cvt instructions.

Could somebody please help me?!? :confused:

What I also don’t know is the number of clock cycles spent for this kind of instruction!?? Does anybody know these numbers?

My guess is that I have to prevent my program or the compiler from using 64-bit pointers. But how to do that?

I have tried to use the -m32 command in my makefile to prevent the compiler from using 64-bit pointers, but since I have to include libraries, which pointed by a 64-bit pointer this strategy doesn’t work out so far.

[codebox]

uint4 * table;

(*table) = make_uint4( a_ptr - base_ptr, c_ptr - a_ptr, x, y);

table++;

[/codebox]

An example of the PTX code that I get is the following:

[codebox]

cvt.s32.u64 	%r54, %rd38;

cvt.s32.u64 	%r55, %rd2;

sub.s32 	%r56, %r54, %r55;

cvt.s32.u64 	%r57, %rd35;

sub.s32 	%r58, %r57, %r54;

.loc	3	112	0

ld.shared.u16 	%r40, [%rd49+0];

.loc	3	134	0

and.b32 	%r59, %r40, 96;

st.global.v4.u32 	[%rd11+0], {%r56,%r58,%r27,%r59};

.loc	3	135	0

add.u64 	%rd11, %rd11, 16;

.loc	3	136	0

add.u32 	%r19, %r19, 1;

[/codebox]

I appreciate it very much if anybody comes up with an idea!

PTX code is not optimized, it is completely different from binary file.

In fact, CUDA uses 64-bit pointer but in binary code, all pointer arithmetics are 32-bit.

you can use decuda to deassembly .cubin file and you would obtain this information.

please check http://wiki.github.com/laanwj/decuda