get_global_id() and global_work_offset on NVidia platform Is this supported? How to remove these ext


I took a look at the ptx code that the NVCompiler generates for global_work_offset(0) and was impressed about so many code lines. The compiler seems to support global_work_offset although the official OpenCL specification states that it isn’t supported in OpenCL 1.0 yet.

Is there a way to use a compiler option to get rid of these extra instructions? As I target both NV and ATI, I don’t want to use this feature (although it’s a nice one though…), would be nice if I could save some instructions.

This is what is generated by the NVidia OpenCL compiler for get_global_id(0):


	// get_global_id(0) 

	.reg .u32 	%vntidx; 

	.reg .u32 	%vctaidx; 

	.reg .u32 	%vtidx; 

	cvt.u32.u16 	%vntidx, %ntid.x; 

	cvt.u32.u16 	%vctaidx, %ctaid.x; 

	cvt.u32.u16 	%vtidx, %tid.x; 

	mad.lo.s32 	%r1, %vntidx, %vctaidx, %vtidx; 

	.reg .u32 	%temp; 

	ld.const.u32 	%temp, [%_global_launch_offset+0]; 

	add.u32 	%r1, %r1, %temp;