The OpenCL 1.1 (1.0 likewise) standard states in Table 6.16 regarding the barrier command:
I know that NVIDIA’s OpenCL implementation will execute code like the following, but I am not really sure if this is covered by the standard. Personally, I’m not quite sure how to interpret “[…] executing the kernel.” (and think it would be more clear to drop that addition completely). One could say that the returning work-items are not executing the kernel anymore and therefore do not have to reach the barrier, but a more restrictive interpretation would find such code illegal (although an implementation may support this beyond the OpenCL standard). I would especially be curious if others find the standard vague in that context, and what NVIDIA’s official interpretation is.
__kernel void k( unsigned height , ... ) {
if ( get_global_id( 1 ) >= height ) {
return;
}
f();
barrier( CLK_LOCAL_MEM_FENCE );
g();
}