Hi, I want to write some code to test latency of specific instruction like add.u32. But the optimizer always optimize code and the final sass code is not exactly what I want. If i disable optimizer, many mov instructions is inserted.
For example,
I write device code as blow:
__global__
void new_u32_add_latency(long long *latency, int *result, int a, int b) {
long long start, end;
do {
__syncwarp(0xffffffff);
asm ("mov.u64 %1, %%clock64;\n\t"
"add.u32 %0, %0, %3;\n\t"
"mov.u64 %2, %%clock64;\n\t"
:"+r"(a), "=l"(start), "=l"(end): "r"(b));
__syncwarp(0xffffffff);
}while(start >= end);
*latency = end - start;
*result = a + b;
}
The output of nvdisasm:
Well, the third parameter of IADD3 comes from constant memory which is not what I want.
I tried to add a temp register and manually move the parameter b in constant memory to the temp register. However, the compiler optimized these out and the third parameter still comes from constant memory.
Latter I tried to disable optimizer, but I still can’t get expected sass code. Many mov instructions is inserted between the two mov instructions like blow:
How to “customize” the final SASS code?