I very much agree with tera - I also need this kind of leverage on the code generation. So far, the best solution found is this:
if( clock() == 0 ) some_useless_variable++;
Here is a reproducer. We start with:
__global__ void reduce( float *dst, float *src )
{
float a = src[0];
float b = src[1];
float c = src[2];
float d = src[3];
dst[0] = a+b+c+d;
}
which compiles into:
/*0008*/ /*0x10005de428004001*/ MOV R1, c [0x0] [0x44];
/*0010*/ /*0x10001de428004005*/ MOV R0, c [0x0] [0x144];
/*0018*/ /*0x10015c034800c000*/ IADD R5, R0, 0x4;
/*0020*/ /*0x0050dc8580000000*/ LD R3, [R5];
/*0028*/ /*0xf0511c8583ffffff*/ LD R4, [R5+-0x4];
/*0030*/ /*0x10509c8580000000*/ LD R2, [R5+0x4];
/*0038*/ /*0x0c40dc0050000000*/ FADD R3, R4, R3;
/*0048*/ /*0x20501c8580000000*/ LD R0, [R5+0x8];
/*0050*/ /*0x0830dc0050000000*/ FADD R3, R3, R2;
/*0058*/ /*0x00009de428004005*/ MOV R2, c [0x0] [0x140];
/*0060*/ /*0x00301c0050000000*/ FADD R0, R3, R0;
/*0068*/ /*0x00201c8590000000*/ ST [R2], R0;
/*0070*/ /*0x00001de780000000*/ EXIT;
Note that it interleaved adds and loads, which is not what I want. But if I use the device above:
global void reduce( float *dst, float *src )
{
float a = src[0];
float b = src[1];
float c = src[2];
float d = src[3];
if( clock() == 0 ) a+=1.f;
dst[0] = a+b+c+d;
}
[/code]
I get:
/*0008*/ /*0x10005de428004001*/ MOV R1, c [0x0] [0x44];
/*0010*/ /*0x10019de428004005*/ MOV R6, c [0x0] [0x144];
/*0018*/ /*0x00001de428004005*/ MOV R0, c [0x0] [0x140];
/*0020*/ /*0x00615c8580000000*/ LD R5, [R6];
/*0028*/ /*0x10611c8580000000*/ LD R4, [R6+0x4];
/*0030*/ /*0x2060dc8580000000*/ LD R3, [R6+0x8];
/*0038*/ /*0x30609c8580000000*/ LD R2, [R6+0xc];
/*0048*/ /*0x40019c042c000001*/ S2R R6, SR_ClockLo;
/*0050*/ /*0x0051dc005000cfe0*/ FADD R7, R5, 0x3f800;
/*0058*/ /*0x14715c23310c0000*/ ICMP.EQ R5, R7, R5, R6;
/*0060*/ /*0x10511c0050000000*/ FADD R4, R5, R4;
/*0068*/ /*0x0c40dc0050000000*/ FADD R3, R4, R3;
/*0070*/ /*0x08309c0050000000*/ FADD R2, R3, R2;
/*0078*/ /*0x00009c8590000000*/ ST [R0], R2;
/*0088*/ /*0x00001de780000000*/ EXIT;
Which does the job. Cool, heh?
PS. Wasn’t here a “preview” button?