asm function overhead

When I use the asm function I get some extra mov instructions. It looks like the first instruction and the last instruction are not really needed because reg195 and reg196 are never accessed again. Does the just-in-time compiler optimize this so it turns into just one instruction(so i don’t need to worry about it) or is there a way so that it does not add the two extra instuctions?

asm("bfe.u32 %0, %1," #4 "," #4 ";" : "=r"(extract) : "r"(source));

produces…

mov.u32  %r195, %r194;

bfe.u32  %r196, %r191, 4, 4;

mov.s32  %r197, %r196;

…but the it could be simplified to this…

bfe.u32  % r197, %r191, 4, 4;

The PTX you see has not had the final register allocation performed yet. If you want to see the real assembly, you need to use cuobjdump on the .cubin.

Hi Seibert, That worked. I ran cuobjdump on the a .cubin for both versions(one with and one without) and the compiler only added one instruction and not all three. When looking at the output I also noticed that it optimized out several other mov instructions also. There are actually very few mov instructions in the final output.

I learned a new trick using the cuobjdump tool. Thank you very for idea. I think you have to be one of the few people who answer the most theads - thank you.