Questions about OpenCL-enabled CPU and memory

Hi, all
I am reading the document OpenCL Best Practices Guide.
In the document the phrase OpenCL-enabled NVidia GPUs is used.
I don’t know exactly what the phrase means.
OpenCL like CUDA C is for programming programmable GPUs.
Does it mean programmable GPUs?
or special programmable GPUs which are designed for programming using OpenCL.

About memory, global memory and local memory are located off chip.
Why aren’t they cached? Maybe the global memory is shared between host
and all threads in device, so in order to maintain consistency, they are
not cache. But the local memory is own by each thread, so it should be
cached.

Thanks in advance.
Jogging

Although I don’t use OpenCL, as far as I know all (or at least all of the recent) CUDA-capable GPUs are also OpenCL-capable, assuming you have the appropriate drivers installed. Now, I have no idea if they mean “enabled” as in “enabled for use by the OpenCL driver”, which might be a software choice to keep devices (like, say, your display device) from being used by OpenCL.

I was surprised to see this, so I checked the manual and found what you are referring to. This makes no sense, given that compute capability 2.0 and greater GPUs automatically use their L1 and L2 cache for local and global memory when executing CUDA code. If OpenCL does not use the cache, then that would be a giant handicap… This sounds like something to benchmark and verify. (Certainly, I can’t think of any reason to disable L2, as that is shared by the entire GPU, and won’t have any consistency issues.)