Hello,
I took a look at the ptx code that the NVCompiler generates for global_work_offset(0) and was impressed about so many code lines. The compiler seems to support global_work_offset although the official OpenCL specification states that it isn’t supported in OpenCL 1.0 yet.
Is there a way to use a compiler option to get rid of these extra instructions? As I target both NV and ATI, I don’t want to use this feature (although it’s a nice one though…), would be nice if I could save some instructions.
This is what is generated by the NVidia OpenCL compiler for get_global_id(0):
{
// get_global_id(0)
.reg .u32 %vntidx;
.reg .u32 %vctaidx;
.reg .u32 %vtidx;
cvt.u32.u16 %vntidx, %ntid.x;
cvt.u32.u16 %vctaidx, %ctaid.x;
cvt.u32.u16 %vtidx, %tid.x;
mad.lo.s32 %r1, %vntidx, %vctaidx, %vtidx;
.reg .u32 %temp;
ld.const.u32 %temp, [%_global_launch_offset+0];
add.u32 %r1, %r1, %temp;
}
Regards,
Nils