Kernel arguments and private address space Apparent inconsistency between CUDA and OpenCL

Section 6.5 of the standard (vers. 1.0.43) states “All arguments to a __kernel function shall be in the __private address space.” This is contrary to CUDA’s practice of storing kernel arguments in shared (== “__local” in openCL parlance) memory. Is this what happens in practice, or does nvidia’s openCL follow the CUDA practice? It seems like requiring arguments to go in the private address space is rather wasteful, inasmuch as it requires the kernel either to dump the arguments into (what CUDA calls) local memory, or to chew up valuable on-chip resources duplicating data that is identical across work items. Can anyone in the know comment? It matters from a programming point of view because it looks as if programmers will have to take appropriate steps to make sure arguments to openCL kernels are handled efficiently.

-robert.