Hi Everybody,
I’m trying to find a way to prevent NVCC from placing cvt instructions.
Could somebody please help me?!? External Image
What I also don’t know is the number of clock cycles spent for this kind of instruction!?? Does anybody know these numbers?
My guess is that I have to prevent my program or the compiler from using 64-bit pointers. But how to do that?
I have tried to use the -m32 command in my makefile to prevent the compiler from using 64-bit pointers, but since I have to include libraries, which pointed by a 64-bit pointer this strategy doesn’t work out so far.
[codebox]
uint4 * table;
…
(*table) = make_uint4( a_ptr - base_ptr, c_ptr - a_ptr, x, y);
table++;
…
[/codebox]
An example of the PTX code that I get is the following:
[codebox]
cvt.s32.u64 %r54, %rd38;
cvt.s32.u64 %r55, %rd2;
sub.s32 %r56, %r54, %r55;
cvt.s32.u64 %r57, %rd35;
sub.s32 %r58, %r57, %r54;
.loc 3 112 0
ld.shared.u16 %r40, [%rd49+0];
.loc 3 134 0
and.b32 %r59, %r40, 96;
st.global.v4.u32 [%rd11+0], {%r56,%r58,%r27,%r59};
.loc 3 135 0
add.u64 %rd11, %rd11, 16;
.loc 3 136 0
add.u32 %r19, %r19, 1;
[/codebox]
I appreciate it very much if anybody comes up with an idea!